SlavovLab / DART-ID

DART-ID: retention time alignment and peptide identification confidence updates
https://dart-id.slavovlab.net
MIT License
13 stars 4 forks source link

Receiving Import error #16

Open Lukas67 opened 1 year ago

Lukas67 commented 1 year ago

Hi,

An error is returned when running the program:

ImportError: cannot import name 'gcd' from 'fractions' (/home/lukas/anaconda3/lib/python3.9/fractions.py)

As per stack overflow the problem is caused by the networkx module which import statements changed upon python updates (I am using python 3.9.7).

BR Lukas

atc3 commented 1 year ago

The DART-ID conda environment (https://github.com/SlavovLab/DART-ID/blob/master/environment.yml) is set up to run Python 3.7.6. Is there a specific reason you need to run Python 3.9.7?

Lukas67 commented 1 year ago

Hi,

thanks for your reply.

I need that particular version to run other programs.

In your description of the program it says that it runs on python >= 3.7.

BR Lukas

atc3 commented 1 year ago

Thanks for letting me know about the description -- this program was released when python 3.7 was the latest and I was trying to communicate that it would work with any 3.7 version. I will update the description to be more explicit about this requirement.

In the meantime, you should be able to use python virtualenvs (https://docs.python.org/3/library/venv.html) or conda environments (https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) to run DART-ID in a separate python environment, so that you can still run your other programs on their other python versions.

Let me know if you need and help with this

Lukas67 commented 1 year ago

Hi,

I ran the program now in an virtual environment and it works now.

I received an error while running my first evidence file:

dart_id -c /home/lukas/Desktop/MS-Data/Lukas/mq-run_150223/combined/txt/config_annotated.yaml -o /home/lukas/Desktop/MS-Data/Lukas/mq-run_150223/combined/txt/output_dart_id 2023-02-27 09:56:22 [ERROR] Number of experiments filter threshold 3 is greater than the number of experiments in the input list. Please provide an integer greater than or equal to 1 and less than the number of experiments with the "num_experiments" key. Traceback (most recent call last): File "/home/lukas/anaconda3/envs/dart_env/bin/dart_id", line 8, in sys.exit(main()) File "/home/lukas/anaconda3/envs/dart_env/lib/python3.7/site-packages/dart_id/update.py", line 355, in main df, df_original = process_files(config) File "/home/lukas/anaconda3/envs/dart_env/lib/python3.7/site-packages/dart_id/converter.py", line 385, in process_files raise ConfigFileError('Number of experiments filter threshold {} is greater than the number of experiments in the input list. Please provide an integer greater than or equal to 1 and less than the number of experiments with the \"num_experiments\" key.'.format(config['num_experiments'])) dart_id.exceptions.ConfigFileError: Number of experiments filter threshold 3 is greater than the number of experiments in the input list. Please provide an integer greater than or equal to 1 and less than the number of experiments with the "num_experiments" key.

I used your default config file and changed only the input files to one evidence file from my first mq run. Where do I define the n-experiment argument?

Thanks for your help!

BR Lukas

atc3 commented 1 year ago

Hi Lukas,

The num_experiments parameter can be found at the bottom of the config files. See here for example: https://github.com/SlavovLab/DART-ID/blob/85af08c21527687c62307faa1a55e06d99ef0d67/config_files/example_sqc_67_95_varied.yaml#L157

More importantly however -- DART-ID is only able to infer latent retention times by using data from multiple LCMS runs. If you only provide one experiment, there is no statistical power to be gained.

i.e., if there is a low-confidence peptide in run A, we can increase confidence in our observation in run A if we see the same peptide at the same RT in run B (and ideally, in runs C, ..., N -- the more experiments we use, the more power we have).

I would strongly recommend not using DART-ID if you only have one run, and to only use this tool if you have multiple (and ideally many) similarly configured LCMS runs.

If you have any more questions let me know

Lukas67 commented 1 year ago

Hi Albert,

thank you for assistance.

My data is acquired from single cell monocytes. I think it would be a good idea to align retention times and include your program in my workflow. Although it is my 4th week in proteomics and I have only acquired 1 run successfully with MaxQuant. Hence the program should work, but with no improvements of PSM scores right? So I hope the error does not rely because of observing the same run twice:

File "/home/lukas/anaconda3/envs/dart_env/lib/python3.7/shutil.py", line 104, in copyfile raise SameFileError("{!r} and {!r} are the same file".format(src, dst)) shutil.SameFileError: 'config.yaml' and '/home/lukas/Desktop/MS-Data/Lukas/dart_id/config.yaml' are the same file

I appreciate your help.

BR Lukas

Lukas67 commented 1 year ago

Hi Albert,

I have one additional question: How similar runs need to be. Does it solely rely on labeling techniques such as TMT and else or is it possible to align different experimental designs. If yes, what are the constraints?

BR Lukas

atc3 commented 1 year ago

Although it is my 4th week in proteomics and I have only acquired 1 run successfully with MaxQuant. Hence the program should work, but with no improvements of PSM scores right?

Do not run this program with just one run -- there is no improvement to be gained and the code relies on multiple experiments (and PSMs existing across n experiments as defined by the num_experiments param)

How similar runs need to be. Does it solely rely on labeling techniques such as TMT and else or is it possible to align different experimental designs. If yes, what are the constraints?

There are no constraints to the chemistry of the labeling or LC -- the liquid chromatography just has to be consistent. In our paper we use DART-ID in TMT-labeled and label-free runs. However, do not mix runs of different chemistries/chromatographies or even runs that are far apart (and thus not reproducible). For example, do not mix label-free and TMT-labelled runs -- the TMT labeling itself is chemically modifying peptides and altering their retention times (which DART-ID assumes to be so consistent that each run only requires a small linear adjustment to hit the "true" retention time).

Hope this helps