DroidsOnRoids / jspoon

Annotation based HTML to Java parser + Retrofit converter
https://www.thedroidsonroids.com/blog/scraping-web-pages-with-retrofit-jspoon-library
MIT License
323 stars 23 forks source link

I can not parse the list of values. #51

Closed KonstantinKustau closed 6 years ago

KonstantinKustau commented 6 years ago

Hello. I want to get a list of all the "Transaction Fee" on this site: https://www.binance.com/fees.html

I do the following: @Selector(value = ".accountInfo-lists li.td .items > div.fullName") public List<String> posts;

And I get only one element in String format: {{asset.assetName}}

How do I get a list of all the "Transaction Fee"?

cr3ativ3 commented 6 years ago

Because that table is populated via ajax query (JavaScript) from (https://www.binance.com/assetWithdraw/getAllAsset.html). If you would disable JavaScript and go to https://www.binance.com/fees.html you would see an accountInfo-list table to look like this:


<ul class="accountInfo-lists">
    <li class="th">
        <div class="items f-cb">
            <div class="coin f-left color9">{{'Coin' | T}}</div>
            <div ng-if="cur_lang=='cn'" class="fullName f-left color9">币种全称</div>
            <div ng-if="cur_lang=='cn'" class="total f-left color9">最小提币数量</div>
            <div ng-if="cur_lang=='cn'" class="useable f-right color9">提币手续费</div>
            <div ng-if="cur_lang!='cn'" class="fullName f-left color9">Name</div>
            <div ng-if="cur_lang!='cn'" class="total f-left color9">Minimum Withdrawal</div>
            <div ng-if="cur_lang!='cn'" class="useable f-right color9">Transaction Fee</div>
        </div>
    </li>
    <li class="td" ng-repeat="asset in asset">
        <div class="items f-cb">
            <div class="coin f-left"><img ng-src="{{asset.logoUrl}}">{{asset.assetCode}}</div>
            <div class="fullName f-left">{{asset.assetName}}</div>
            <div class="total f-left">{{asset.minProductWithdraw | rate}}</div>
            <div class="useable f-right">{{asset.transactionFee}}  {{asset.assetCode}}</div>    
        </div>
    </li>
</ul>
KonstantinKustau commented 6 years ago

@cr3ativ3 Thanks for the answer. I want to ask you something else. I need a library that analyzes the text data from the monitor screen and parsing it. Could you share your experience? How do you think is the easiest way to do this?

cr3ativ3 commented 6 years ago

@CoOstOFF you should try some OCR (Optical Character Recognition) library/engine. Usually they are quite big and resource expensive. It's essentially machine learning. I have heard good things about ABBYY, but its proprietary, although has a cloud solution where its servers do the heavy-lifting. Some more popular open source ones are tessaract engine (has tess4j java wrapper/bindings), but be prepared to well train your models for good accuracy. Aprise OCR isn't free but it's also Java.

KonstantinKustau commented 6 years ago

@cr3ativ3 Okay. Thank you for your advice.