DorresteinLaboratory / NAP_ProteoSAFe

Other
7 stars 5 forks source link

NAP only working on subset of spectra #2

Closed dlforrister closed 4 years ago

dlforrister commented 4 years ago

Good afternoon,

Question regarding Job: ID=914dedb12d0e4ea7aa7f42cbf2667f58

I just re-ran NAP after making some upstream changes to our pipeline. I am surprised to see that NAP only made predictions for about 160 compounds on this new dataset. Upon closer look, it only made predictions on a small subset of the data. My data are split into 4 groups in the initial molecular networking. None of the compounds from groups 1 and 2 have any NAP results, whereas compounds in group 3 and 4 have lots of results. The first thing I can think of is that there are slight differences in the format of the input mgfs of these different groups:

For example. Group 1/2 have headers like this: BEGIN IONS FEATURE_ID=1.000000 PEPMASS=1846.919685 SCANS=1.000000 RTINSECONDS=1275.030000 CHARGE=-1 MSLEVEL=2 ions... END IONS

Group 3/4 (with NAP results) BEGIN IONS PEPMASS=261.1329 CHARGE=1- SCANS=1 ions... END IONS

Note the other difference is that group 1 and 2 has two hard return between spectra (i.e. an empty line). In group 3/4 there are no empty lines between spectra.

Is it possible to look at the latest NAP run (job-id above) and see why all the spectra from groups 1 and 2 were missed? Could it be because of the Charge=-1 instead of Charge=1- ? or the addition of the other info in the mgf? I can re-write and re-run to test if its those differences but would take a lot of time to re-run.

Thank you,

Dale Forrrister

rsilvabioinfo commented 4 years ago

Hello Dale,

the charge is not considered by NAP.

I don't have much time to look at your data and improve matching now.

Can you provide specific nodes from GNPS network where you expect a NAP matching and a structure that would expect NAP to match in your database?

Also, I suggest that you post your questions here: https://groups.google.com/forum/#!forum/molecular_networking_bug_reports

As a larger community will see and may be able to help or be helped by your question.

Cheers, Ricardo

Em qui., 16 de abr. de 2020 às 17:14, dlforrister notifications@github.com escreveu:

Good afternoon,

Question regarding Job: ID=914dedb12d0e4ea7aa7f42cbf2667f58

I just re-ran NAP after making some upstream changes to our pipeline. I am surprised to see that NAP only made predictions for about 160 compounds on this new dataset. Upon closer look, it only made predictions on a small subset of the data. My data are split into 4 groups in the initial molecular networking. None of the compounds from groups 1 and 2 have any NAP results, whereas compounds in group 3 and 4 have lots of results. The first thing I can think of is that there are slight differences in the format of the input mgfs of these different groups:

For example. Group 1/2 have headers like this: BEGIN IONS FEATURE_ID=1.000000 PEPMASS=1846.919685 SCANS=1.000000 RTINSECONDS=1275.030000 CHARGE=-1 MSLEVEL=2 ions... END IONS

Group 3/4 (with NAP results) BEGIN IONS PEPMASS=261.1329 CHARGE=1- SCANS=1 ions... END IONS

Note the other difference is that group 1 and 2 has two hard return between spectra (i.e. an empty line). In group 3/4 there are no empty lines between spectra.

Is it possible to look at the latest NAP run (job-id above) and see why all the spectra from groups 1 and 2 were missed? Could it be because of the Charge=-1 instead of Charge=1- ? or the addition of the other info in the mgf? I can re-write and re-run to test if its those differences but would take a lot of time to re-run.

Thank you,

Dale Forrrister

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

dlforrister commented 4 years ago

I re-ran after changing all CHARGE=-1 to CHARGE=1- and it worked.

dlforrister commented 4 years ago

Thanks for the info. NAP worked throughout the dataset after I changed the charge value (-1 to 1-) in the original mgfs. In the future, I'll post to the link you sent above, thanks for sharing it as I didn't know about it. Thanks for getting back to me.

-DF

On Fri, Apr 17, 2020 at 5:43 AM Ricardo notifications@github.com wrote:

Hello Dale,

the charge is not considered by NAP.

I don't have much time to look at your data and improve matching now.

Can you provide specific nodes from GNPS network where you expect a NAP matching and a structure that would expect NAP to match in your database?

Also, I suggest that you post your questions here: https://groups.google.com/forum/#!forum/molecular_networking_bug_reports

As a larger community will see and may be able to help or be helped by your question.

Cheers, Ricardo

Em qui., 16 de abr. de 2020 às 17:14, dlforrister notifications@github.com escreveu:

Good afternoon,

Question regarding Job: ID=914dedb12d0e4ea7aa7f42cbf2667f58

I just re-ran NAP after making some upstream changes to our pipeline. I am surprised to see that NAP only made predictions for about 160 compounds on this new dataset. Upon closer look, it only made predictions on a small subset of the data. My data are split into 4 groups in the initial molecular networking. None of the compounds from groups 1 and 2 have any NAP results, whereas compounds in group 3 and 4 have lots of results. The first thing I can think of is that there are slight differences in the format of the input mgfs of these different groups:

For example. Group 1/2 have headers like this: BEGIN IONS FEATURE_ID=1.000000 PEPMASS=1846.919685 SCANS=1.000000 RTINSECONDS=1275.030000 CHARGE=-1 MSLEVEL=2 ions... END IONS

Group 3/4 (with NAP results) BEGIN IONS PEPMASS=261.1329 CHARGE=1- SCANS=1 ions... END IONS

Note the other difference is that group 1 and 2 has two hard return between spectra (i.e. an empty line). In group 3/4 there are no empty lines between spectra.

Is it possible to look at the latest NAP run (job-id above) and see why all the spectra from groups 1 and 2 were missed? Could it be because of the Charge=-1 instead of Charge=1- ? or the addition of the other info in the mgf? I can re-write and re-run to test if its those differences but would take a lot of time to re-run.

Thank you,

Dale Forrrister

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DorresteinLaboratory/NAP_ProteoSAFe/issues/2#issuecomment-615199704, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTWESUI3AQEZPDEBOG4MLDRNA6HNANCNFSM4MKDRLNA .

-- PhD Candidate Coley/Kursar Lab Department of Biology 257 S 1400 E, University of Utah Salt Lake City, UT 84112