compSPI / ioSPI

I/O and Data Visualization
MIT License
4 stars 7 forks source link

OSF Data Download and Upload Scripts #25

Closed arjunsingh3600 closed 2 years ago

arjunsingh3600 commented 2 years ago

@thisTyler @jedyeo @calhep TEMUpload uploads data generated by TEMSimulator in the structure to CryoEM-dataset page the discussed here.

The implementation for generate_tags_from_tem() and test_post_files() will be merged in an upcoming PR.

arjunsingh3600 commented 2 years ago

The unit tests for TEMUpload are currently failing as they require an authorization token from OSF.io to run. I do have an authorization token that is scoped to have access to the OSF page, however, I wasn't sure what the best practice for sharing that was. Is there anything in the existing git workflow to accommodate this?

geoffwoollard commented 2 years ago

There's some issues with importing. Have a look at the details of the failed Linting and Testing

==================================== ERRORS ====================================
____________________ ERROR collecting tests/test_fourier.py ____________________
ImportError while importing test module '/home/runner/work/ioSPI/ioSPI/tests/test_fourier.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/share/miniconda/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_fourier.py:5: in <module>
    from ioSPI import fourier
E   ImportError: cannot import name 'fourier' from 'ioSPI' (/home/runner/work/ioSPI/ioSPI/__init__.py)
__________________ ERROR collecting tests/test_tem_upload.py ___________________
ImportError while importing test module '/home/runner/work/ioSPI/ioSPI/tests/test_tem_upload.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/share/miniconda/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_tem_upload.py:9: in <module>
    from ioSPI.ioSPI import tem_upload
ioSPI/tem_upload.py:6: in <module>
    from simSPI.simSPI import tem
E   ModuleNotFoundError: No module named 'simSPI'
__________ ERROR collecting tests/test_iotools/test_atomic_models.py ___________
ImportError while importing test module '/home/runner/work/ioSPI/ioSPI/tests/test_iotools/test_atomic_models.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/share/miniconda/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_iotools/test_atomic_models.py:8: in <module>
    from ioSPI.iotools.atomic_models import read_gemmi_model, write_gemmi_model
E   ModuleNotFoundError: No module named 'ioSPI.iotools'
___________ ERROR collecting tests/test_iotools/test_cryo_dataset.py ___________
ImportError while importing test module '/home/runner/work/ioSPI/ioSPI/tests/test_iotools/test_cryo_dataset.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/share/miniconda/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_iotools/test_cryo_dataset.py:6: in <module>
    from ioSPI.iotools import cryodataset as cryo_dataset
E   ModuleNotFoundError: No module named 'ioSPI.iotools'
geoffwoollard commented 2 years ago

You need to install simSPI in https://github.com/compSPI/ioSPI/blob/dataset/upload/environment.yml from github

See how in simSPI we import ioSPI from git hub: https://github.com/compSPI/simSPI/blob/master/environment.yml#L23

arjunsingh3600 commented 2 years ago

Converted to draft to review and refactor

geoffwoollard commented 2 years ago

@arjunsingh3600 simspi isn't installing. it's not in requirements.txt. I think the error will go away when you merge with the updated main.

jedyeo commented 2 years ago

the tests are reliant on tem.py..., we are running into conflicting directory structure

ninamiolane commented 2 years ago

Would this work now that the errors in tem.py have been solved? could you merge with master? Thanks!

thisFreya commented 2 years ago

@ninamiolane Would this work now that the errors in tem.py have been solved? could you merge with master? Thanks!

Unfortunately this issue is connected to the format of the test, which uses simSPI to test ioSPI functionality. This doesn't work as simSPI imports ioSPI to begin with, and circular imports are less than ideal. The test needs some reworking, we will get to it ASAP so that this can be merged.

ninamiolane commented 2 years ago

I see, thank you for the fast answer.

Why does this file worry about the tem simulator? ioSPI should be a self-consistent package (see new mission statement in slack): ioSPI provides tools to read, write and convert biological imaging and shape data: from different formats (gemmi, mrc), for different imaging modalities (cryo em at the moment), at different scales (atomic models, electrostatic maps) for researchers to explore and visualize data from structural biology.

We would, of course use, it for tem-generated data, but the function would be general.

In practice:

Does this make sense? @fredericpoitevin what do you think?

fredericpoitevin commented 2 years ago

I agree, I think it makes sense.

A quick look at the tests suggests that a merge from the current master branch would help, though.

So I'd vote in favor of merging and assessing things again after.

geoffwoollard commented 2 years ago

Follow NIna and Fred's lead on in what repos things should go.

To upload data to OSF it needs to be generated from simSPI and then use ioSPI's functionality.

I think in the past I was under the impression that this uploading simulated data would happen through ioSPI. But now let's go with all metadata being generated within simSPI. ioSPI can just be a module that uploads to OSF. A "uploading notebook" that simulates with simSPI and then uploads to OSF with ioSPI can live in simSPI.

This streamlines what ioSPI does. You will likely have to generalize what features and categories OSF is made aware of. But you can set the rules within ioSPI. For instance, instead of having information about noise and ctf params, you can just have some rules for a general "OSF tag string". And it's up to simSPI or the user using ioSPI to set than on their own. It wouldn't be hard coded in ioSPI.

geoffwoollard commented 2 years ago

I'm confused. Are we going ahead with PR https://github.com/compSPI/ioSPI/pull/40, or is a major move of tem specific code from ioSPI --> simSPI called for?

thisFreya commented 2 years ago

@geoffwoollard I'm confused. Are we going ahead with PR #40, or is a major move of tem specific code from ioSPI --> simSPI called for?

We are currently in the process of adjusting this PR to match up with master (which has had PR#46 merged in). That PR (refactoring) moved things into ioSPI that were TEM-specific - the major move has already happened. All the code we will push into ioSPI will be TEM-agnostic, whereas TEM-specific items will be pushed into their relevant places in simSPI. Unfortunately this will take us some time to get sorted which has caused some confusion - apologies!

codecov[bot] commented 2 years ago

Codecov Report

Merging #25 (8614ad7) into master (1f37233) will decrease coverage by 1.10%. The diff coverage is 93.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #25      +/-   ##
==========================================
- Coverage   97.53%   96.43%   -1.09%     
==========================================
  Files           4        5       +1     
  Lines         121      168      +47     
==========================================
+ Hits          118      162      +44     
- Misses          3        6       +3     
Impacted Files Coverage Δ
__init__.py 100.00% <ø> (ø)
ioSPI/datasets.py 93.62% <93.62%> (ø)
ioSPI/__init__.py

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1f37233...8614ad7. Read the comment docs.

geoffwoollard commented 2 years ago

We are currently in the process of adjusting this PR to match up with master (which has had PR#46 merged in). That PR (refactoring) moved things into ioSPI that were TEM-specific - the major move has already happened. All the code we will push into ioSPI will be TEM-agnostic, whereas TEM-specific items will be pushed into their relevant places in simSPI. Unfortunately this will take us some time to get sorted which has caused some confusion - apologies!

Ok so don't merge any more code that isn't aligned with the vision of ioSPI.

arjunsingh3600 commented 2 years ago

@ninamiolane @geoffwoollard Thank you for the review! As pointed out while the script was TEM-agnostic, the docstrings/method names certainly weren't. I've made a few changes based on the feedback above.

Please let me know if these changes work.