ReinV / SCOPE

Search and Chemical Ontology Plotting Environment
Other
1 stars 2 forks source link

download_files.py slow or stalled #29

Closed magnuspalmblad closed 3 years ago

magnuspalmblad commented 3 years ago

When running the download_files.py script, it appears the downloads stall at 2010-2019_ChEBI_IDs.tsv. I could download this file from osf.io, and it took just under a minute. But the script has worked for more than 2 hours on this file now. Is this something you have seen too? Is OSF throttling the download, or is the script stalled?

This is my output:

(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master>python download_files.py
ChEBI2Class.pkl
ChEBI2logP.tsv
ChEBI2Mass.tsv
ChEBI2logS.tsv
ChEBI2logS.tsv
ChEBI2Names.tsv
ChEBI2Smiles.tsv
2020-2029_ChEBI_IDs.tsv
2010-2019_ChEBI_IDs.tsv
magnuspalmblad commented 3 years ago

FYI, the file is currently 25 MB, and the 2020-2029 file 0 bytes.

ReinV commented 3 years ago

No, it's also downloaded in a minute when I run the script and saved in the correct folder.

ReinV commented 3 years ago

Can you rerun the script and see if it happens again? (delete the files and searches folder first). If so, then we need to add some lines and do some testing to see what is happening.

magnuspalmblad commented 3 years ago

I am rerunning it now, and realized I had to delete the two folders first. The behavior I expected was that it would just download these again and overwrite the existing files. (An even more elegant solution would be to check if the local files are different than the files on ostf.io.)

ReinV commented 3 years ago

The script now checks if the folders exist and if so, it does nothing. I guess this is not the most obvious behaviour but it works when new users pull from github and then run the script.

magnuspalmblad commented 3 years ago

OK, now I received three decade files. Is this the correct behavior?

(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master>ls -l searches_by_year
total 1044683
-rw-rw-rw-   1 user     group    287649374 Sep  9 15:06 2000-2009_ChEBI_IDS.tsv
-rw-rw-rw-   1 user     group    756443045 Sep  9 15:05 2010-2019_ChEBI_IDs.tsv
-rw-rw-rw-   1 user     group    25661453 Sep  9 15:02 2020-2029_ChEBI_IDs.tsv

(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master>
ReinV commented 3 years ago

Yes! I guess we should at least add a "download completed" print statement. My plan was to see if this also works for you, than we can add the rest later to the downloads.