emory-libraries / ezpaarse-platforms

Platforms parsers, scrapers and PKBs for ezPAARSE
3 stars 3 forks source link

China Online Journals / Wanfang Data #42

Closed eipeele closed 4 years ago

eipeele commented 6 years ago

Example:star::star: :

www.wanfangdata.com.cn.proxy.library.emory.edu/

Domains:

ad.wanfangdata.com.cn advert.wanfangdata.com.cn c.g.wanfangdata.com.cn c.wanfangdata.com.cn cdn.wanfangdata.com.cn check.wanfangdata.com.cn d.g.wanfangdata.com.cn d.old.wanfangdata.com.cn d.oldg.wanfangdata.com.cn d.wanfangdata.com.cn f.g.wanfangdata.com.cn f.wanfangdata.com.cn fz.wanfangdata.com.cn g.wanfangdata.com.cn img-sns-alioss.wanfangdata.com.cn librarian.wanfangdata.com.cn login.med.wanfangdata.com.cn login.wanfangdata.com.cn med.wanfangdata.com.cn medad.med.wanfangdata.com.cn message.wanfangdata.com.cn miner.wanfangdata.com.cn my.wanfangdata.com.cn new.wanfangdata.com.cn old.wanfangdata.com.cn oldcheck.wanfangdata.com.cn oss.wanfangdata.com.cn s.g.wanfangdata.com.cn s.wanfangdata.com.cn service.med.wanfangdata.com.cn social.old.wanfangdata.com.cn social.wanfangdata.com.cn static.wanfangdata.com.cn subject.med.wanfangdata.com.cn tran.wanfangdata.com.cn trend.wanfangdata.com.cn work.wanfangdata.com.cn www.wanfangdata.com.cn wanfangdata.com.cn

Priority:

Low

Subscriber (Library):

Woodruff

ezPAARSE

Analysis: None Trello: None

CB987 commented 4 years ago

Putting this note in here so I don't forget -- want to check just in case. So as noted above, this site has a whole bunch of domains. When building the parser I noticed that it dropped proxy on miner. wanfang.com, so I checked the stanza and all the stanza has is

T China Online Journals U http://c.g.wanfangdata.com.cn/Periodical.aspx DJ wanfangdata.com.cn

... so do we need to update the stanza with all the above domains?

CB987 commented 4 years ago

SEARCH;HTML; http://www.wanfangdata.com.cn:80/search/searchList.do?searchType=all&searchWord=NZFO http://www.wanfangdata.com.cn:80/search/searchList.do?searchType=all&showType=&pageSize=&searchWord=unicorn&isTriggerTag= http://www.wanfangdata.com.cn:80/search/searchList.do?searchType=tech&showType=&pageSize=&searchWord=oxygen&isTriggerTag= http://www.wanfangdata.com.cn:80/search/searchList.do?searchType=patent&showType=detail&pageSize=20&searchWord=%E7%94%B3%E8%AF%B7%E4%BA%BA%2F%E4%B8%93%E5%88%A9%E6%9D%83%E4%BA%BA%3A http://video.wanfangdata.com.cn:80/s/search/?conditionString=CWX12-0-0-0-0-0-0-0-1-0-0/& http://www.wanfangdata.com.cn:80/search/searchList.do?searchType=standards&searchWord=%E6%A0%87%E5%87%86%E5%88%86%E7%B1%BB:

REF;HTML; <— or ABSTRACT? http://www.wanfangdata.com.cn:80/details/detail.do?_type=perio&id=MBD2%255C%255CMBD%255C%255CMBD2%255C%255CS1755267209001183h.xml http://www.wanfangdata.com.cn:80/details/detail.do?_type=perio&id=10.1111%252Fnous.12161 <— id = doi (sometimes, but not always) http://www.wanfangdata.com.cn:80/details/detail.do?_type=perio&id=MBD2%255C%255CMBD%255C%255CMBD2%255C%255CS1755267209001183h.xml http://www.wanfangdata.com.cn:80/details/detail.do?_type=degree&id=D01099412 http://www.wanfangdata.com.cn:80/details/detail.do?_type=tech&id=Jl0kLKWlc2lzN0Viqj4JSDM3U6XQBlvgZEbZJjOGkFY%253D http://www.wanfangdata.com.cn:80/details/detail.do?_type=perio&id=zgxxws201609006

DATA;HTML; http://miner.wanfangdata.com.cn:80/themeBootPage/explainAndstatistics.do?themeWord=United%20states http://miner.wanfangdata.com.cn:80/themeBootPage/explainAndstatistics.do?themeWord=Female

VIDEO;MISC; (which you can’t play bc it’s flash) http://video.wanfangdata.com.cn:80/v/play/SI160419149.html http://video.wanfangdata.com.cn:80/v/play/SL160727690.html http://video.wanfangdata.com.cn:80/v/play/SD140110274.html

wanfang_logs.txt