Closed davidecarlson closed 8 months ago
Note that I asked about this on the nf-core/quantms slack as well. But since the issue occurs even when run manually outside of the pipeline, I figured maybe I should ask here.
I will double check and go back to you today. Can you provide the full SDRF?
Thanks a lot! I'm attaching the full SDRF (file extension changed from tsv to txt for Github compatibility).
I've very new to MS-based proteomics, so it's very possible I've introduced some sort of error with the SDRF file.
Before testing, can I ask you why not PTMs are added in the SDRF?
The only reason is that I'm processing data collected by someone else, and I was not given any information regarding post-translation modifications.
Normally Oxidation of methionine is allowed as variable and almost 99% of the cases, Carbamidomethyl C is also allowed.
Thanks. I will add those variables.
Do you think that is related to the error I'm getting, or is including just more best practices?
Thanks! Dave
Both. It is good because Im almost 100% sure you will need them and It can be the source of the error. We dont have a good error message system in the released version of the sdrf-pipelines but we are working to improved it.
Okay, thanks. I will add that and report back on the result.
I've added two new protein modification columns:
comment[modification parameters] comment[modification parameters] NT=Oxidation;MT=Variable;TA=M;AC=UNIMOD:35 NT=Carbamidomethyl;AC=UNIMOD:4;TA=C;MT=Fixed
However, I'm still seeing the same error message when running parse_sdrf
:
[~]$ parse_sdrf validate-sdrf --sdrf_file my_SDRF.tsv
Everything seems to be fine. Well done.
[~]$ parse_sdrf convert-openms -t2 -l -s my_SDRF.tsv
PROCESSING: DeRosa_sdrf.tsv"
Factor columns: ['factor value[phenotype]']
Characteristics columns (those covered by factor columns removed): ['characteristics[organism]', 'characteristics[organism part]', 'characteristics[cell type]', 'characteristics[disease]', 'characteristics[biological replicate]']
Error: 'NoneType' object has no attribute 'group'
Any additional suggestions?
My updated SDRF file is attached.
Thanks! Dave my_SDRF.txt
Okay, I was able to track down the error.
It seems that, contrary to the docs, the value of comment[cleavage agent details]
cannot be set to "not applicable".
After replacing "not applicable" with a placeholder value (in this case NT=Trypsin; AC=MS:1001251; CS=(?⇐[KR])(?!P)
), the error goes away.
In my case, I am not certain if a cleavage agent was used or not, but I will try to find out from the core facility that generated the data.
Best, Dave
Good find! I believe this should be a bug then. I just renamed the title.
Okay, good to know that it's a bug. In that case, I believe the error is introduced here:
The re.search assumes that the value in the comment[cleavage agent details] column includes an "NT". If the regex search returns an empty result, the group method fails and throws an error.
But also the docs are misleading and should be reworked. It says the "NT" part is mandatory, so I am confused myself, how one would specify "no cleavage". Would it be "NT=not applicable" or just "not applicable". In any case, even if it could parse "NT=not applicable" it will probably fail, since "not applicable" is not part of the mapping in https://github.com/bigbio/sdrf-pipelines/blob/f19d38ca2b6d51cce3cab9c3f8921ad2219fee80/sdrf_pipelines/openms/openms.py#L77
ping @ypriverol
Not applicable
is valid for SDRF but not for the processing with quantms because all pipelines supported now in quantms are enzyme specific.
For you @davidecarlson this also means you could emulate a no cleavage behaviour by specifying "NT=No cleavage".
@ypriverol They are only tested with enzymes for now but I don't see anything speaking against not using an enzyme. All search engines support this.
Thanks, guys!
From my perspective, this can be closed. But I will leave it open in case there is more to discuss.
I appreciate the assistance.
Best, Dave
@jpfeuffer I have tested already the pipeline without enzyme and it doesn't work. I can try to find the issue. Both things do not work, no enzyme or multiple enzymes.
@davidecarlson keep us posted with your results from quantms, we want to see how the pipeline works for others. Thanks for using the workflow.
I will close the issue.
Hi All,
I'm trying to run
parse_sdrf convert-openms
on my SDRF file, but I'm getting an error that I don't fully understand, particularly since the validation tool doesn't indicate any obvious issues with my input file. Here are the commands I've run along with the output:Here are the first five lines of my input file. I can attach the full thing if it would be useful, but the issue arises even using only this portion of the input.
Any ideas on what I'm doing wrong?
Thank you! Dave