Open Type-IIx opened 3 years ago
Great project so far. I wonder if you could add a feature to scrape journals? I'd like to configure the parser for that.
Sure, will do that and release an update soon.
@Type-IIx would you like to get involved in the project? This package would be even better when i know what kind of functionality and structure you will love to use.
@Type-IIx would you like to get involved in the project? This package would be even better when i know what kind of functionality and structure you will love to use.
Perhaps! For now, what I envision, is scraping journals. Example (try this URL yourself for a generic example):
http://{{libgen_root}}/scimag/journals/32406 =>
<h1 class="header"><a href="http://libgen.rs/">Library Genesis</a>: <a href="http://{{libgen_root}}/scimag/">Scientific articles</a></h1>
<p style="margin:1em 0;text-align:center">
Current alias domains are @[{{libgen_root0, libgen_root1..}}, ...
This is a random journal/scimag, basically went to {{libgen_root}}/scimag/journals/ and 32406 <=> [A][0] from browsing into the scimag/journals/ tree.
Journal page [template] The journal page gives:
Journal: | {{Journal_Title}}
Publisher: | {{Publisher}}
ISSN (print): | {{ISSN}}
Website: | http://{{journal_site}}
Description:
{{description}}
All articles: | search | DOI list
YYYY | Volume 1 ... n
If you could provide methods such that libgen.journal (or scimag).download_year(), or .search_issn(), .search_title() to return a hash/array of a hash/array, eg [2009,2008,...] => [1,2,3,4] of this journal page's journals/scimags. Further, you should be able to download() by year(s), issue(s), DOI(s)[!!!]. DOIs are particularly important for reference here, and it would be great to be able to search by DOI (the DOI per each journal is always always the journal/scimag root followed by /doi; probably (untested) https?://{{libgen_root}}/scimag/\d{5}/doi
So, perhaps a few different download and search methods warranted.
P.S. I am no Python wizard!
I see that the API (JSON) does not support scimag yet! I went to the source (https://forum(dot)mhut(dot)org/viewtopic.php?f=17&t=6874) to ask about this.
I would suggest, you have the freedom to build methods that do not rely on their API; or you could contribute to their API.
A (clunky) method that apparently works: Right now you can take SQL database dump in "Download" section of main sites http://{{libgen_root}}/scimag/ and search articles (e.g., by DOI) in it.
I have already implemented journal, mags are on the way. The problem is i have very constrained time to spend on this project, But i am spending some.
Their API is not very cool so i had to scrape the results for better data.
Any way, i hope you will love the next release, i will probably publish it in 10 to 15 days or so 😅 It will take time.
Thank you for showing interest 👍🏻
Great project so far. I wonder if you could add a feature to scrape journals? I'd like to configure the parser for that.