Closed magnuspalmblad closed 2 years ago
I could add that I have the individual years in the "searches_by_year" folder, not the decade ones.
I see now that I get this error also with other and smaller searches, such as our example "APCI" and "HILIC"...
That makes sense because my expectation is that this is caused by the "searches_by_year" files. Do you use all of them? You can always try to use only one decade (the most recent one for example).
Obviously we should make sure "make_table.py" can run with all searches by year files. But for now you can try to see if less files works.
"Searches_by_decade" are more concise and therefore more memory efficient for "make_table.py". I'm not sure if I ever tested the newest "make_table.py" script using the "searches_by_year" files.
See https://osf.io/cfjde/ for link to 2010-2019 searches by decade file.
I replaced the individual years with the decades, but then get this error:
(base) G:\Projects\Reinier Vleugels\SCOPE-master_2022>python make_table.py -i results -t folder
getting searches by year ...
Traceback (most recent call last):
File "make_table.py", line 160, in <module>
main()
File "make_table.py", line 135, in main
data = import_properties()
File "make_table.py", line 60, in import_properties
df['ChEBI'] = df['ChEBI'].astype(int)
File "C:\Users\nmpalmblad\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2773, in __getitem__
if self.columns.is_unique and key in self.columns:
File "C:\Users\nmpalmblad\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
File "pandas\_libs\properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
File "C:\Users\nmpalmblad\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_data'
(base) G:\Projects\Reinier Vleugels\SCOPE-master_2022>ls searches_by_year
1940-1949_ChEBI_IDS.tsv 1980-1989_ChEBI_IDS.tsv 2020-2029_ChEBI_IDs.tsv
1950-1959_ChEBI_IDS.tsv 1990-1999_ChEBI_IDS.tsv pre1945_ChEBI_IDs.tsv
1960-1969_ChEBI_IDS.tsv 2000-2009_ChEBI_IDS.tsv
1970-1979_ChEBI_IDS.tsv 2010-2019_ChEBI_IDs.tsv
(base) G:\Projects\Reinier Vleugels\SCOPE-master_2022>
This error is caused when processing the files from the "files" folder. Could you first check if you got all the recent files from the OSF storage?
Yes, I get the exact same error with the files in the OSF in the "files" folder.
I cannot reproduce this error, so before making you add lines to print stuff, can you also check 1) if you have the latest "make_table.py" script version and 2) if your pandas package is updated (pip install --upgrade pandas).
Still the same issue?
Updating pandas solved this issue.
When running a search for COVID-19 related papers published in the last two years (search string "(FIRST_PDATE:[2020-01-01 TO 2022-12-31]) AND ("2019-nCoV" OR "2019nCoV" OR "COVID-19" OR "SARS-CoV-2" OR ("wuhan" AND "coronavirus") OR "Coronavirus" OR "Corona virus" OR "corona-virus" OR "corona viruses" OR "coronaviruses" OR "SARS-CoV" OR "Orthocoronavirinae" OR "MERS-CoV" OR "Severe Acute Respiratory Syndrome" OR "Middle East Respiratory Syndrome" OR ("SARS" AND "virus") OR "soluble ACE2" OR ("ACE2" AND "virus") OR ("ARDS" AND "virus") or ("angiotensin-converting enzyme 2" AND "virus"))") resulting in ~360,000 hits, make_table.py runs out of memory. It cannot allocate 16 GB, even if my 64 GB workstation has almost 60 GB free.
This is the output from SCOPE:
Any ideas why or how to fix this?