dspinellis / alexandria3k

Local relational access to openly-available publication data sets
GNU General Public License v3.0
79 stars 14 forks source link

Add USPTO sampling #17

Closed AggelosMargkas closed 10 months ago

AggelosMargkas commented 11 months ago

Add sampling to USPTO to control Zip file and container sampling, defaults to lambda n: ("True", "True").

When the first variable of the callable returns True the Zip file will get processed, when it returns False the container will get skipped. Similarly, when the second variable of the callable returns True the container will get processed, when it returns False the container will get skipped.

Zip file sampling is handled in uspto.py, while container sampling is handled in uspto_zip_cache.py

Add changes to caching tests to comply with CI regulations.

AggelosMargkas commented 10 months ago

This is proving difficult to get right. I added some comments regarding required fixes.

On it!