Closed kaisengit closed 3 years ago
Hello,
Thanks a lot for the submission of the bug!
What you see in the terminal is just a warning that doesn't have any influence on the work of the tool. The error shown in the GUI tells that the program can't extract the columns that should be used for further analysis and visualization. All columns that should be present in the DIA-NN output are mentioned in the GUI and in the manual (Protein.Ids, Modified.Sequence, Run). You may also check our test file for the DIA-NN.
If you have all above-mentioned columns in the report.tsv file and still get the same error message, I would be glad if you could share with us the whole file or even part of it (several rows together with the names of the columns).
If you still have any other questions, I'm happy to help! 😀
Best Jane
Hi there and thanks for the help! My report.tsv definitely contains these columns you mention. In fact it contains even more columns than your DIA-NN test data file:
{'First.Protein.Description',
'Lib.PG.Q.Value',
'Lib.PTM.Site.Confidence',
'Ms1.Translated',
'PEP',
'PTM.Informative',
'PTM.Localising',
'PTM.Q.Value',
'PTM.Site.Confidence',
'PTM.Specific'}
These columns are found inside my report.tsv but not in your file. Maybe that is where the problem is coming from? It could be that DIA-NN 1.8 introduced some new columns, I guess. I'll also see if I can prepare a dummy .tsv file to showcase the problem.
Thanks a lot for your answer! No, new columns shouldn't be a problem. I would be really thankful for a small example file to reproduce the bug that will help us to solve it. 🙏
I have created a minimal example which shows the problem: minimal_report.zip
Thanks a lot for sending your file! I've checked it and the issue is that in the Protein.Ids column you need to have the Uniprot unique entry identifier, e.g. P01308 for the human insulin protein. As I mentioned before, you may always look at the example file. Unique Protein.Ids are needed to combine the user experimental data in AlphaMap with the earlier mined Uniprot data. We'll add more detailed information about the content of these columns in the instructions for the supported proteomics software tools to prevent any user's confusion.
In connection to this case when Protein.Ids does not contain valid IDs, is there any specific rationale for using Protein.Ids and not Protein.Group? Protein.Ids normally contains all mapped proteins, while Protein.Group - the proteins inferred with a maximum parsimony algorithm.
Initially our idea of AlphaMap was to visualize proteomics data on the peptide level without making any assumptions with regard to protein inference. The protein grouping strategies between software tools vary and if individual peptides are of interest you might miss important information on possible parental proteins/genes (btw. the hover info shows all protein ids that a peptide can be mapped to).
That being said, I understand that it might be misleading for the more 'protein focussed' people to have a peptide shown for proteins that are not included in the assigned protein group. We will think about this again - maybe we can introduce a parameter for users to choose whether to show a peptide for all 'Protein.Ids' or only the 'Protein.Group' accessions.
Oh, makes sense, thank you for the explanation!
Thanks a lot for sending your file! I've checked it and the issue is that in the Protein.Ids column you need to have the Uniprot unique entry identifier, e.g. P01308 for the human insulin protein. As I mentioned before, you may always look at the example file. Unique Protein.Ids are needed to combine the user experimental data in AlphaMap with the earlier mined Uniprot data. We'll add more detailed information about the content of these columns in the instructions for the supported proteomics software tools to prevent any user's confusion.
That makes sense, thank you for having a look. I used a custom fasta for my search and I guess this the reason why the Uniprot IDs were not included in the report. I'll close my bug report then.
Describe the bug When uploading a DIA-NN report.tsv file the following error is displayed in the terminal:
The error shown in the browser is the following:
From my own testing of working with DIA-NN output (version 1.8) with pandas setting low_memory=False fixes the problem.
To Reproduce Steps to reproduce the behavior:
Expected behavior No error
Desktop (please complete the following information):