Support for scimag (Journals)

Type-IIx commented 3 years ago

Is there active development for scimag journals in the LibGen API? With your project, do you have imminent plans to support scimag either with or without the API?

Yetangitu commented 3 years ago

There is no support in the API but I do plan to add support in books in the near future (now that I'm done with classify: https://ipfs.io/ipfs/QmPjTqZ18NWLjpokUn9NwnaxKnSA8pVaQcwmGBK78SAYJB). This support will be comparable to the way libgen_fiction is supported, i.e. with regular database refresh. I might start posting delta updates to IPFS, not sure yet to the viability of such a scheme.

Yetangitu commented 3 years ago

I'm doing some experiments with the scimag dataset which quickly showed that the most common type of query used in books is far too slow to be useable - ~10 minutes for a simple select * from scimag where title like 'AN ASSESSMENT OF CHEMICAL%'; using a dedicated database VM with 16GB of memory - which implies that only direct absolute queries are useable - select * from scimag where doi='10.1111/j.1745-4565.2004.tb00382.x';returns the result in milliseconds. The question is whether it makes sense to integrate libgen_scimag into books with these limitations. It would take a dedicated database server with at least 64GB of memory (the database + index currently takes ~46GB) to speed up partial (like) queries to something approaching useable performance. As it stands now the only use case would be a direct download based on exact DOI or title, i.e. nscimag -d '10.2307/1219787' or nscimag -t 'An Assessment of Chemical Features for the Classification of Plant Fossils' would download the article. Fulltext search performance over author and title is similarly abysmal. I don't think it makes sense to add support in this way so I'll look into using 'net-based resources instead of a local database for scimag support for now.

Type-IIx commented 3 years ago

It's great to see that you have been working on the problem. I suppose it is such a massive dataset; but yes, way more RAM would be needed. I know you're a volunteer, so the fact you're working on this problem so well is greatly appreciated.

I do admit, I would certainly most often be searching by wildcard. However, the ability to download a journal by volume for example would be useful and probably a lot faster. I know there have been solutions to this. On the LibGen mhut forum there was an old project that seems to have been killed off or otherwise inaccessible now by backwar.

Is development for LibGen moving to IPFS for the most part?

Yetangitu commented 3 years ago

I don't really know where libgen development is moving, the project has fractured into different forks which do not seem to get along all that well. The founder ("bookwarrior") can now be found at libgen.fun/libgen.life, he sees the people at mhut/libgen.rs as having betrayed the original intention of the project. I started working on books to make it possible to access libgen from a *nix terminal and do not consider myself to be part of any of the libgen projects, books can be used with or adapted to any fork which makes the database available for download.

I do see more and more rumblings about IPFS so I work from the assumption that "client access" will eventually move there. As to whether it will take over in the role of collection distribution currently done through bittorrent I do not know. There is an interesting technical demonstration on libgen.fun/libgen.life for an IPFS-distributed sqlite database + interface to libgen:

http://ipfs.io/ipfs/QmUqd8zbStKHfTHTo3cCLx7FR8t1g11WLKXk8m6Kyv7i5s/

(this link might stop working, the project is discussed at https://libgen.life/viewtopic.php?f=39&t=7940)

This uses a webasm-version of sqlite to access an IPFS-distributed database.

The same idea would probably work for the - far larger - scimag database. I'll have a look at adding support for this type of database access to books, it would make it far easier to keep the thing up to date and would negate the need for a local mysql installation.

Type-IIx commented 3 years ago

I don't really know where libgen development is moving, the project has fractured into different forks which do not seem to get along all that well. The founder ("bookwarrior") can now be found at libgen.fun/libgen.life, he sees the people at mhut/libgen.rs as having betrayed the original intention of the project. I started working on books to make it possible to access libgen from a *nix terminal and do not consider myself to be part of any of the libgen projects, books can be used with or adapted to any fork which makes the database available for download.

I do see more and more rumblings about IPFS so I work from the assumption that "client access" will eventually move there. As to whether it will take over in the role of collection distribution currently done through bittorrent I do not know. There is an interesting technical demonstration on libgen.fun/libgen.life for an IPFS-distributed sqlite database + interface to libgen:

http://ipfs.io/ipfs/QmUqd8zbStKHfTHTo3cCLx7FR8t1g11WLKXk8m6Kyv7i5s/

(this link might stop working, the project is discussed at https://libgen.life/viewtopic.php?f=39&t=7940)

This uses a webasm-version of sqlite to access an IPFS-distributed database.

The same idea would probably work for the - far larger - scimag database. I'll have a look at adding support for this type of database access to books, it would make it far easier to keep the thing up to date and would negate the need for a local mysql installation.

Ah, Okay. First of all: thank you for the social information re: the fork-schism between mhut and libgen-life. Important.

So, as a new registrant, I am unable to view https://libgen(dot)life/viewtopic.php?f=39&t=7940

I am supportive of the design of IPFS: however, I do take issue with its approach to location-hidden (Tor) servers and clients: see https://github.com/ipfs/notes/issues/37 (perfect being the enemy of good). And any real hope of reinvigorating the effort has seemed to dwindle or die off (https://github.com/berty/go-libp2p-tor-transport), the go-onion-transport being left to OpenBaz--r, a not-quite-legimitate service in my estimation. One more cynical point, the scihut BitTo--ent distribution of sci-mag and other collections remains a PoC (https://git[dot]sr[dot]ht/~scihut/scihut) without much progress. Though, it is interesting.

Less cynically:

I can access the IPFS demonstration you provided and it is fast
I agree that bash and *nix development is vital as it's portable for all users, including Cygwin, Darwin, etc.
I am very glad you pointed me towards libgen(dot)life. More things!

Yetangitu / books

Support for scimag (Journals) #1