Closed ns-rse closed 1 year ago
- Should be sorted in any manner, perhaps by the ID column? I don't think it would help, what matters is to lump all the matches together at the top and the rest after. The example you sent is fine.
- There appear to be a number of rows included where no match has been found, would it be easier/convenient if they are excluded? NO. The masses which cannot be matched to searched structure are the interesting ones, because they are the so-called "peptidoglycan dark matter" (they may or may not be novel PG structures).
Here's the list of the revised names (in the order they should appear, top is left, bottom is right) ID Ion count Charge state XIC start (min) XIC end (min) RT (min) Obs (Da) Theo (Da) ∆ppm Inferred structure Intensity
Format is also a bit of an issue; Modifications would be welcome to avoid manual operations: XIC start 2 DECIMALS ONLY XIC end 2 DECIMALS ONLY RT 2 DECIMALS ONLY ∆ppm 1 DECIMAL ONLY Intensity SCIENTIFIC FORMAT
Many of the column names originate from the ftrs
file and/or database at data/first_test_data.ftrs
. This sqlite3 database has the following table headers...
sqlite> PRAGMA table_info(ChargeClusters);
0|Id|INTEGER|1||1
1|scanIndex|INT|1||0
2|vendorScanNumber|INT|1||0
3|retentionTimeMinutes|REAL|1||0
4|mzFound|REAL|1||0
5|intensity|INT|1||0
6|mwMonoisotopic|REAL|1||0
7|monoOffset|INT|1||0
8|averagineCorrelation|REAL|1||0
9|charge|INT|1||0
10|isotopeCount|INT|1||0
11|scanNoiseFloor|REAL|1||0
12|driftChannel|INT|0||0
13|mobilityScanGroup|INT|0||0
14|mobilityValue|REAL|0||0
sqlite> PRAGMA table_info(FeatureMobilities);
0|Id|INTEGER|1||1
1|feature|INT|1||0
2|charge|INT|1||0
3|mobilityValueStart|REAL|1||0
4|mobilityValueEnd|REAL|1||0
sqlite> PRAGMA table_info(FeatureFinderSettings);
0|Id|INTEGER|1||1
1|parameter|TEXT|1||0
2|value|TEXT|1||0
sqlite> PRAGMA table_info(Features);
0|Id|INTEGER|1||1
1|xicStart|REAL|1||0
2|xicEnd|REAL|1||0
3|apexRetentionTimeMinutes|REAL|1||0
4|feature|INT|1||0
5|apexMwMonoisotopic|REAL|1||0
6|maxAveragineCorrelation|REAL|1||0
7|maxIntensity|INT|1||0
8|ionCount|INT|1||0
9|chargeOrder|TEXT|1||0
10|maxIsotopeCount|INT|1||0
Input files also have headers which may be the source of variable/column names. The example maxquant_test_data.txt
has the following headers...
❱ head tmp/maxquant_test_data.txt -n1 | sed 's/\t/\n/g'
Raw file
Type
Charge
m/z
Mass
Uncalibrated m/z
Resolution
Number of data points
Number of scans
Number of isotopic peaks
PIF
Mass fractional part
Mass deficit
Mass precision [ppm]
Max intensity m/z 0
Retention time
Retention length
Retention length (FWHM)
Min scan number
Max scan number
Identified
MS/MS IDs
Sequence
Length
Modifications
Modified sequence
Proteins
Score
Intensity
Intensities
Isotope pattern
MS/MS Count
MSMS Scan Numbers
MSMS Isotope Indices
Looking through the code pgio.ftrs_reader()
appears to pick most of the features and so they stem from either the database table Features
(see above) although this may sometimes be a file input/uploaded by users and so the defaults as well as input files will need changing.
Mapping columns (order is also indicated by the rows in the table below)...
Current | New | Source |
---|---|---|
ID |
ID |
Input / Features table |
ionCount |
Ion count |
Input / Features table |
chargeOrder |
Charge state |
Input / Features table |
xicStart |
XIC start (min) |
Input / Features table |
xicEnd |
XIC end (min) |
Input / Features table |
rt |
RT (min) |
Input (Retention time ) |
mwMonoisotopic |
Obs (Da) |
Input / Features table |
theo_mwMonoisotopic |
Theo (Da) |
Derived (pgfinder.pgio.ftrs_reader() ) |
diff_ppm |
∆ppm |
Derived (pgfinder.matching.calculate_ppm_delta() ) |
inferredStructure |
Inferred structure |
Derived |
maxIntensity |
Intensity |
Input / Features table |
The order of columns is defined in pgfinder/pgio.ftrs_reader()
(line 109)
Do the comment above await a response? Sounds like you've worked it out??
Do the comment above await a response? Sounds like you've worked it out??
Sorry, no response required, I was just making notes for when I get round to making the changes. Looking at this again now.
Currently the columns names in output CSV files are...
These could be improved