Roestlab / massdash

MassDash: A web-based dashboard for streamlined DIA-MS visualization, analysis, prototyping, and optimization
https://massdash.streamlit.app/
BSD 3-Clause "New" or "Revised" License
16 stars 3 forks source link

Josh's Suggestions for rawData refactoring #56

Closed jcharkow closed 6 months ago

jcharkow commented 7 months ago

Here is my attempt at refactoring the rawData a bit. The main idea of this refactor is to make the loader interface more consistent between SqMassLoader and the Raw data extraction for easier python usage.

Note: This has not been tested and is likely very buggy but I wanted to start the PR early so everyone is up to date.

Major Changes include:

  1. Major refactoring of loaders
    • create new "access" folder which is meant to contain more "low level" access methods
      • e.g. this is direct pyopenms methods and SQL queries
    • mzMLDataLoader.py - links a mzML file, results file and spectral library to do the heavy lifting of targeted extraction. Main method is loadFeatureMaps(). This provides more consistency with how SqMassLoader is implemented. This replaces OSWLoader, DIANNLoader, TargetedExtractionLoader and MzMLLoader.
    • Reporttsv is renamed to ResultsTSVDataAccess. Currently only DIA-NN TSV is supported. Will work on adding OSW .tsv
    • Changes to GenericLoader to be parent of both mzMLDataLoader and SqMassDataLoader (and future .d loader)
    • TransitionGroupFeature now stores more meta info so that this data structure can be used more widely

Server methods will have to be adjusted based on this which I am working on currently.

jcharkow commented 7 months ago

@singjc I know that this is still in progress but can you please have a quick look (even just at the description) to let me know if this refactoring sounds ok? E.g. not going to screw everything up?

singjc commented 7 months ago

@jcharkow Looks fine / makes sense. I renamed the reportLoader already in the main branch, so there may be some conflict there. I just pulled in the most recent changes from the main feature/rawdata branch, but there are some conflicts with some of the changes I was working on. I will fix those and then probably leave it for you to work on the rest of the refactoring.

I did update the oswDataAccess for the get_top_rank_feature methods to use a feature hash table to only index on indices.

jcharkow commented 7 months ago

Thanks for looking it over and letting me refactor your code. I'm sorry that it seems we are getting a lot of conflicts but hopefully, this leads to a more unified interface overall :)

jcharkow commented 7 months ago

@singjc this branch is ready for your review now. All of the same functionality should be there like before however I did some refactoring in order to make the python interface cleaner. Also did some general refactoring that should make things more readable?

Main points are summarized below

Note: the streamlit interface is not heavily tested and possibly I did not use caching in the right place so if you could help with that, that would be great. Also not sure if using these "high level functions" in TargetedRawDataExtraction screws up the caching?

singjc commented 7 months ago

Just testing this out, but I seem to run into an error with imports?

2023-12-28 00:47:45.069 Uncaught app exception
Traceback (most recent call last):
  File "/home/justincsing/anaconda3/envs/py39/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
    exec(code, module.__dict__)
  File "/media/justincsing/ExtraDrive1/Documents2/Roest_Lab/Github/MassSeer/massseer/gui.py", line 9, in <module>
    from massseer.server.RawTargetedExtractionAnalysisServer import RawTargetedExtractionAnalysisServer
  File "/media/justincsing/ExtraDrive1/Documents2/Roest_Lab/Github/MassSeer/massseer/server/RawTargetedExtractionAnalysisServer.py", line 9, in <module>
    from massseer.ui.RawTargetedExtractionAnalysisUI import RawTargetedExtractionAnalysisUI
  File "/media/justincsing/ExtraDrive1/Documents2/Roest_Lab/Github/MassSeer/massseer/ui/RawTargetedExtractionAnalysisUI.py", line 11, in <module>
    from massseer.loaders.access.TargetedDIADataAccess import TargetedDIAConfig
ModuleNotFoundError: No module named 'massseer.loaders.access.TargetedDIADataAccess'
2023-12-28 00:47:45.069 Uncaught app exception
Traceback (most recent call last):
  File "/home/justincsing/anaconda3/envs/py39/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
    exec(code, module.__dict__)
  File "/media/justincsing/ExtraDrive1/Documents2/Roest_Lab/Github/MassSeer/massseer/gui.py", line 9, in <module>
    from massseer.server.RawTargetedExtractionAnalysisServer import RawTargetedExtractionAnalysisServer
ImportError: cannot import name 'RawTargetedExtractionAnalysisServer' from 'massseer.server.RawTargetedExtractionAnalysisServer' (/media/justincsing/ExtraDrive1/Documents2/Roest_Lab/Github/MassSeer/massseer/server/RawTargetedExtractionAnalysisServer.py)

UPDATE: Fixed import in commit fe6dd6624091c2557d4963c9b0cec4708854b9f2