Closed jspaezp closed 1 year ago
@jspaezp I like option 2
. What do you think @jpfeuffer @daichengxin @fabianegli ?
Yes maybe we should do "2.", just in case we allow mixed designs at some point.
Name of the option could also be "extension_map(ping)"
I have finished the first iteration of this implementation. @jpfeuffer Can I ask you something, will the name only changed in the openms.tsv
but in the experimentl_design.tsv
they are keep with the original name?
So, to be honest, after thinking about it a bit more, I think the safest way would be to always keep the original filenames everywhere.
BUT this means that you need to pass the design along the workflow for a bit longer, until the actual conversion step. The actual conversion step will then precisely know what it is converting and which filenames to replace in the design.
By the way, has anyone tried with having the "raw" names in the design first? I thought @timosachsenberg implemented a file basename check only in the affected OpenMS tools.
I'm saying this because in n theory, this is all a hack based on the idea "we know that it most likely needs to be converted at some point in the future to mzML", which is not very nice.
@jpfeuffer I guess, this will need some refactoring in quantms and also OpenMS if we don't do conversions?
No, as I said this should be implemented.
Can we continue with the following approach and then do the major refactoring? In any case, this PR only include the possibilityin sdrf-pipelines
to perform conversions in names. For example, if the keep names are allowed, then conversion will not be performed.
I think no changes are necessary but someone has to try.
I actually removed now the option keep raw
in the sdrf-pipelines. That would be the default option. If we want to convert between formats we will need to use option --extension_convert
including raw -> mzML and others.
Hello there!
This fix broke how my data was handled, If I had files that ended with .zip
it would fail but finishing with exit status 0 and generating an empty experimental design.
I am working over here on fixing this, I also noticed that the testing suite always passed because "ERROR" was looked for in the output, but click errored out with the "Error" word, thus it never failed.
Right now this test fails on me, let me know if that is the intended behavior.
def test_validate_srdf(change_test_dir):
"""
:return:
"""
runner = CliRunner()
result = runner.invoke(cli, ["validate-sdrf", "--sdrf_file", f"{TESTDATA_PATH}/PXD000288.sdrf.tsv", "--check_ms"])
assert result.exit_code == 0, result.output
assert "ERROR" not in result.output.upper()
print("validate sdrf " + result.output)
"""
# Currently fails with this error
E AssertionError: The following columns are mandatory and not present in the SDRF: comment[technical replicate] -- ERROR
E The column comment[technical replicate] is not present in the SDRF -- ERROR
E The following columns are mandatory and not present in the SDRF: comment[technical replicate] -- ERROR
E The column comment[technical replicate] is not present in the SDRF -- ERROR
E There were validation errors.
E
E assert True == 0
E + where True = <Result SystemExit(True)>.exit_code
"""
You mean in your experimental design you have data in the way .zip
and you want to convert .zip
-> .d
in the OpenMS experimental design?
Let me go back a bit ... just to make sure we are all in the same page.
If I wanted to set up a workflow with the file https://ftp.pride.ebi.ac.uk/pride/data/archive/2023/05/PXD037164/3817_TIMS2_2col-80m_37_1_Slot1-46_1_4768.d.zip
; then I would build a .sdrf where that URL goes in the comment[file uri]
column; therefore it would make a lot of sense to me that the comment[data file]
column contains the filename from that URI (3817_TIMS2_2col-80m_37_1_Slot1-46_1_4768.d.zip
).
nontheless, since we need the name in the experimental design (experimental_design
, column Spectra_Filepath
) to match the file name that the search engine will recieve, we would need to convert that .zip
to .d
.
'why do it in this stage?': Since there is now a flag in quantms that allow forcing the conversion of .d files to .mzML, makes more sense to me to allow the user to have the same starting .sdrf file, with any extension supported by the pipeline (d.gz, d.tar.gz, .d.tar, .d ...) and have the pipeline translate the extension, instead of having the user create an .sdrf with the "expected extension" of the output.
HAVING SAID THAT! even if that is not something you want, I will split the work on the testing side of things and submit a PR on it.
Ok, the idea is that we allow users to convert:
I actually in the logic validated that we have only some extensions supported like .d -> .d or .raw -> .mzML, etc. I agree with you that we should move to a more generic appraoch when we allow any type of conversion from any file type to any file type. I will do a PR with the new changes for you to review.
Hello,
I am working on a project where I would like to convert an sdrf to an openms workflow where I would like to keep the extension as is, only for some files.
(I am also implementing the workflow... https://github.com/nf-core/quantms/issues/64#issuecomment-1666645298)
Ideas:
I am not that creative with names ...
if --extension_alias is set, it will try to replace with the passed options
and error out if a file does not match
In this example
specs.raw -> specs.mzML
specs.mzml -> specs.mzML
specs.mzML -> specs.mzML
specs.d -> specs.d
specs.docx -> # NOTHINGS, ERROR raised