Closed lparsons closed 1 month ago
@hepcat72 Can you help me sort out what additional worksheet, etc. I could create to get these files loaded without having to search/replace underscores with dashes? Or perhaps it would just be simpler to patch the matching code?
@hepcat72 Can you help me sort out what additional worksheet, etc. I could create to get these files loaded without having to search/replace underscores with dashes? Or perhaps it would just be simpler to patch the matching code?
Interesting. I think fixing the matching code is a good ultimate solution. RN, it uses an exact match of the basename (without the extension) to match (exactly) the header in order to pair the header with the file. If you know the pairs, you can supply either a tsv or excel (with a sheet named "Peak Annotation Details") to supply the mapping between sample, header, and mzXML. When the submission refactor is done, it will be auto-populated (aside from the mzXML), so it's annoying at the moment, but was written to handle resolving any case, and this is a supported case. It has 4 columns (when the header doesn't match the file's base name): sample name, header name, mzxml file name, and annotation file name. A 5th "sequence name" column might currently be necessary*.
It's probably best to just show you by example. Download this excel file and look at the "Peak Annotation Details" sheet. That's what you would need. Then you just supply it to python manage.py msruns_loader --infile ****HERE**** --mzxml-files *.mzXML
.
* I'd like the command line defaults to be more tightly integrated with the "defaults sheet" so that you wouldn't have to enter the sequence name column.
@lparsons - I was working on this today, and just as I was about to write a test for the new code, I was looking at the test mzXML files I already had and they had dashes in them, so I decided to double-check the conclusion that it was Maven/El-Maven that was doing the swap of dashes with underscores, and discovered that the accucor files had dashes in the headers as well... so I am questioning whether it is El Maven/Maven that is doing this. I noted that the example data you linked appears to be isocorr (or isoautocorr?). Could it be that software that did the dash swap?
OK. It does appear to be isocorr that did this, not Maven/El-Maven.
@hepcat72 I wasn't able to confirm which piece of software changes the sample names, and it could be multiple ones that do.
FEATURE REQUEST
Inspiration
It appears that Maven/El-Maven renames samples that include dashes (
-
) in the filenames and uses underscores (_
) in the sample headers. This can be seen intracebase-dev.princeton.edu:/tracebase-staging/incoming/col013a_perturbative_infusions/
Description
Matching filenames to samples using either dashes (
-
) or underscores (_
) would simplify the loading process.Alternatives
Creating a table to match samples to files explicitly is a reasonable workaround to this issue. I'm not sure which solution is preferable.
Dependencies
Comment
We don't necessarily need to change the code to address this problem, but we do need a documented process to handle the issue and get the datasets loaded.
ISSUE OWNER SECTION
Assumptions
Limitations
Affected Components
Requirements
DESIGN
Interface Change description
None provided
Code Change Description
None provided
Tests