MaterialEyes / exsclaim

A toolkit for the automatic construction of self-labeled materials imaging datasets from scientific literature
GNU General Public License v3.0
30 stars 8 forks source link

UnpicklingError: invalid load key, '<'. #26

Closed GrantViz closed 2 years ago

GrantViz commented 2 years ago

I'm having problems getting the program running. The same error happens on the command-line and importing into python using the JSON query file below. Fresh up to date pip install.

{   
    "name": "OREUTest",

    "journal_family": "nature",

    "maximum_scraped": 5,

    "sortby": "relevant",

    "query":
    {
        "search_field_1":
        {
            "term":"Ag nanoparticle",
            "synonyms":["Ag nanoparticles", "silver nanoparticle", "silver nanoparticle", "nanoparticles of silver", "AgNPs", "AgNP", "Ag NPs", "silver NPs", "silver NP"]
        },
        "search_field_2":
        {
            "term":"HAADF-STEM",
            "synonyms":["HAADF STEM", "High-angle annular dark-field STEM","HAADF image", "(HAADF)", "High-angle ADF", "high-angle annular dark-field (HAADF) imaging", "HAADF imaging"]
        }
    },

    "open": true,

    "save_format": ["postgres", "csv"],

    "logging": ["print", "exsclaim.log"],

    "results_dir": "C:/Users/gblan/anaconda3/Lib/site-packages/extracted"
}

Ran with the following python code

from exsclaim.pipeline import Pipeline
test_pipeline = Pipeline("C:/Users/gblan/OREU Test/OREUTest.json")
results = test_pipeline.run()

I expect 5 scraped journal articles. Ran in conda environment and got the following results after exsclaim logo.

Traceback (most recent call last):

  File "C:\Users\gblan\.spyder-py3\temp.py", line 3, in <module>
    results = test_pipeline.run()

  File "C:\Users\gblan\anaconda3\lib\site-packages\exsclaim\pipeline.py", line 98, in run
    tools.append(FigureSeparator(self.query_dict))

  File "C:\Users\gblan\anaconda3\lib\site-packages\exsclaim\figure.py", line 64, in __init__
    self._load_model()

  File "C:\Users\gblan\anaconda3\lib\site-packages\exsclaim\figure.py", line 93, in _load_model
    self.object_detection_model = self.load_model_from_checkpoint(

  File "C:\Users\gblan\anaconda3\lib\site-packages\exsclaim\figure.py", line 147, in load_model_from_checkpoint
    model.load_state_dict(torch.load(checkpoint, map_location="cpu"))

  File "C:\Users\gblan\anaconda3\lib\site-packages\torch\serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)

  File "C:\Users\gblan\anaconda3\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)

UnpicklingError: invalid load key, '<'.

Other similar errors found were found due to unfinished downloads from google docs.

WeixinGithubJiang commented 2 years ago

Thanks, the issue could be solved by following the instruction in https://github.com/MaterialEyes/exsclaim/issues/27#issue-1260003635

The reason behind this error is that, requests does not allow large file download.

JStuckner commented 2 years ago

I learned about this code at MRS 2022 and I'm very excited to try it. However, I'm having the same error and changing download.py did not help. I would appreciate any help.