cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
419 stars 108 forks source link

Example output columns shifted #531

Closed pontushojer closed 10 months ago

pontushojer commented 10 months ago

I am working a bit on the pangolin MultiQC module and we have a test CSV that is based on the example table posted in your docs: https://cov-lineages.org/resources/pangolin/output.html

image

I noticed that some of the sample columns here not assigned to the proper header and I see that this is aslo the case in your example output. look at Virus3-8 and see that the columns from version and onwards are shifted to the left.

Just wanted to check if might happen in the actual output CSV or if it is just, as I would assume, a misstake in the docs.

wm75 commented 10 months ago

I have never observed such a column shift beofre I'd say so this is likely to be a formatting issue in the docs.

Very nice to see that MultiQC support for pangolin output is getting worked on :) If it helps you with testing: https://usegalaxy.eu/u/wolfgang-maier/h/pangolin-results has two really large pangolin results files (> 300,000 samples each), one produced in usher mode, one in the now deprecated pangolearn mode. Presumably, these should capture all sorts of possible report lines. Note that both were produced with the --expanded-lineage option, which adds the last column to the output.

wm75 commented 10 months ago

The two finished datasets in the Galaxy history above were produced with pangolin version 4.0.5. I've just now triggered a run of version 4.3 (in usher mode) on the same data in the same history. It will turn from yellow to green when it's finished.

aineniamh commented 10 months ago

lineage_report.csv

Hi, in the latest pangolin version the above is the output file- which appears to be the same as the docs and your example. I might be missing something though, so if you could point out any changes I'm not noticing!

Also, I would usually recommend using a csv parser that doesn't rely on column position in a csv file

wm75 commented 10 months ago

@aineniamh your file looks correct, but in the screenshot from the docs above, from line 3 onwards, columns are shifted to the left (version column content is moved to scorpio_notes).

aineniamh commented 10 months ago

Ah I see! Didn't realise it was the content rather than headers! yeah, typo in docs!

pontushojer commented 10 months ago

Thanks for the help! Good to hear that its just a typo :) Will definitely checkout the large datasets you mentioned, would be interesting to see how MultiQC handles that large amount of samples.

aineniamh commented 10 months ago

Docs fixed now too I believe- https://cov-lineages.org/resources/pangolin/output.html table is coded in html, so it's not a real output file related to pangolin