Closed DongdongdongW closed 2 years ago
Thanks for the great work. I agree with @ypriverol it would be better to have one command for both processes. To avoid re-calculating the variant position we can have a condition to skip the process if the position column exists in the input_psm_table
file.
Also, regarding the mzml_path
, maybe it is better to change to mzmls_base_path
since input_psm_table
usually contains PSMs from multiple mzML
files and the file names are written in one of the columns.
Thanks for the great work. I agree with @ypriverol it would be better to have one command for both processes. To avoid re-calculating the variant position we can have a condition to skip the process if the position column exists in the
input_psm_table
file.Also, regarding the
mzml_path
, maybe it is better to change tomzmls_base_path
sinceinput_psm_table
usually contains PSMs from multiplemzML
files and the file names are written in one of the columns.
Thank you for your affirmation.At present, they are under one command, but they still belong to two separate processes. Do you mean we can merge into one process?
And at present, mzml_path
can be the path of many mzmls. If necessary, I can change mzml_path
to mzmls_base_path
.
No, @DongdongdongW now is fine with only one command. The only pending task is to support mzTab.
不,@DongdongdongW现在只需一个命令就可以了。唯一未决的任务是支持 mzTab。
got it
Regarding replacing blast to identify the variant position, we discussed the following with @ypriverol: We can avoid using blast by implementing a function to identify proteins that overlap:
Use our own method to compare peptides and sequences? @husensofteng
yes, if we can have an efficient implementation, ahocorasick
is good for exact matches though I am not sure about its usability for single mismatches.
Two works were possible using validate_peptides, one to calculate the position of the variant amino acids on the variant peptide and the other to validate the variant peptide using spectrumAI. get position:pypgatk validate_peptides --input_psm_table xxx --input_fasta xxx --output_psm_table xxx
'--input_psm_table' is the PSMs table where position is to be obtained. '--input_fasta' is the protein sequence used for quantification. '--output_psm_table' is the file name of the output. spectrumAI: pypgatk validate_peptides --mzml_path xxx --infile_name xxx --outfile_name xxx or pypgatk validate_peptides --mzml_files xxx --infile_name xxx --outfile_name xxx '--mzml_path' is the path to the mzML file in the PSMs table. '--mzml_files' is the name of the mzML file in the PSMs table (need to specify the location of the file, different files are separated by ',') '--infile_name' is the PSMs table that needs to run spectrumAI. It needs to contain 'position', which can be obtained using the the previous command to get position. '--outfile_name' is the file name of the output.