Open xavierrocarada opened 3 years ago
@xavierrocarada can you send the entire MultiQC results folder? And also if you still have it the .nextflow.log
of that run?
multiqc.zip @jfy133 Thanks for having a look at it! :) I have attached the multiqc results folder, but I am sorry to say that I do not have the .nextflow.log of that run... :(
Ok, so the information IS in multiqc_data.json
"22219": {
"endogenous_dna": 7.429926,
"endogenous_dna_post": 5.54621
},
However, when I search for those two values in the exported CSV/TSV files, those values are associated with: 22204_S57_L001_R2_001.
Sample | % Duplicate Reads | Average % GC Content | Average Sequence Length (bp) | Percentage of modules failed in FastQC report (includes those not plotted here) | Total Sequences () | Duplication rate before filtering | Percentage of reads > Q30 after filtering | Bases > Q30 after filtering (millions) | GC content after filtering | Percent reads passing filter | % trimmed reads | Total trimmed reads () | % Duplicate Reads | Average % GC Content | Average Sequence Length (bp) | Percentage of modules failed in FastQC report (includes those not plotted here) | Total Sequences () | Total reads in the bam file () | Reads Mapped in the bam file () | Total reads in the bam file () | Reads Mapped in the bam file () | Percentage of reads categorised as a technical duplicate | CF~1 means high library complexity. Large CF means not worth sequencing deeper. | Non-unique reads removed after deduplication () | Unique mapping reads after deduplication () | 3 Prime 1st base substitution frequency for G>A | 3 Prime 2nd base substitution frequency for G>A | 5 Prime 1st base substitution frequency for C>T | 5 Prime 2nd base substitution frequency for C>T | Read length std. dev. | Median read length | Mean read length | Average coverage (X) on mitochondrial genome. | Average coverage (X) on nuclear genome. | Mitochondrial to nuclear reads ratio (MTNUC) | Reads on the nuclear genome () | Reads on the mitochondrial genome () | Mean GC content | Fraction of genome with at least 1X coverage | Fraction of genome with at least 2X coverage | Fraction of genome with at least 3X coverage | Fraction of genome with at least 4X coverage | Fraction of genome with at least 5X coverage | Median coverage | Mean coverage | % mapped reads | Number of mapped reads () | Number of reads () | Alignment error rate. Total edit distance (SAM NM field) over the number of mapped bases | Rate of Error for Chr X | Rate of Error for Chr Y | Number of positions on Chromosome X vs Autosomal positions. | Number of positions on Chromosome Y vs Autosomal positions. | #SNPs Covered | #SNPs Total | Endogenous DNA (%) | Endogenous DNA Post (%) | Number of SNPs | Contamination Estimate (Method1_MOM) | Estimate Error (Method1_MOM) | Contamination Estimate (Method1_ML) | Estimate Error (Method1_ML) | Contamination Estimate (Method2_MOM) | Estimate Error (Method2_MOM) | Contamination Estimate (Method2_ML) | Estimate Error (Method2_ML) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
22204_S57_L001_R2_001 | 6.93469429562835 | 49 | 101 | 18.1818181818182 | 8428752 | 7.31127 | 92.6158 | 1395.931856 | 55.7652 | 99.885818341311 | 98.3596809666944 | 18904052 | 1.52336861404443 | 57 | 46.8097996027694 | 9.09090909090909 | 1229009 | 13249591 | 8502941 | 24595 | 24595 | 9 | 1.09 | 152467 | 1497370 | 12.4848190429925 | 2.03599881971083 | 11.587147030185 | 1.98637911464245 | 18.4223847868222 | 43 | 47.9921496473478 | 0.028788701792504 | 0.000235751625664 | 122.11 | 15783 | 9 | 46.6830699774266 | 0.221776984778939 | 0.000756186263807 | 0.000108272486329 | 5.0869263521002E-05 | 2.97056100260488E-05 | 0 | 0.0022 | 100 | 140126 | 140126 | 0.45 | 0.034336204644391 | 0.057566950886416 | 0.682406704462112 | 0.181985732122619 | 88628 | 53227092 | 7.429926 | 5.54621 | 1 | 0 | N/A | 0 | 0 | 0 | N/A | 0 | 0 |
BUT, When I look in the JSON that you can export, however, this looks to be correct:
> library(jsonlite)
> res <- read_json("~/Downloads/general_stats_table.json")
> what$categories ## here I scrolled to find the endorspy pre/post columns, which is under element '56'
> what$samples[[56]][[51]] ## find the sample names in column 56, I identify 22219 in 51
[1] "22219"
> what$datasets[[56]][[51]]
[1] 7.429926
> what$datasets[[57]][[51]]
[1] 5.54621
So indeed, I think there is something funky going on in the generalstats export table... will need to wait for Phil unfortunately :\
James, thank you very much for having a look at this! Is there a possible way to get the generals stats from the multiqc_data.json file? Or is it better to wait for Phil to have a look at it?
If you're familiar with R
and JSON files you can reconstruct it (my R example above basically gives you the general gist.
Hej @xavierrocarada, could you please let me know how you created the .csv
file? Is there an option in MultiQC? I have also been trying to do that. Thanks!
@ewels & @jfy133 maybe you would also know how to export this .csv
file?
Using the toolkit/toolbox on the right side of the multiqc, should somewhere give you the option to export the data
Could not find anything on the toolbox, only to download plot data. The only way I could find was to copy the general stats table (copy button) and paste on an empty document, but that is not very practical...
Oops sorry, I probably should have double checked and not replied from my phone, you're right I guess there is only the copy general stats table (or just use the file multiqc_data/mqc_general_stats.txt
, if it exists)
Sorry for the late reply. If you use the toolbox on the right side, there is an export tab. On the top part there are two tabs: Images and Data. If you press "Data" you can download the general_stats_table in three different formats: tab-separated; comma-separated and JSON. If you select comma-separated, you'll get a .csv file. Yes, you can just copy the general stats table if you have it, but you can't do that if you have a beeswarm plot because some data will be missing in the exported file... This bug has not been fixed yet...
@xavierrocarada I also thought that but I don't see that option in 3 different examples I've looked at today (unless I'm being blind):
There are all the plots but not general stats
True, you do not have this option if you already have a table that you can copy. However, if you have a beeswarm plot, you have this option:
Hello!
Thanks very much for all the help. We have found the multiqc_data/mqc_general_stats.txt
file and it is just what we were looking for!
Best regards, George.
Hi @ewels,
Do you have any improvement regarding this issue? I have a whole new dataset and this issue is still happening.
Cheers, Xavier
Not yet sorry. Maybe @ErikDanielsson this is something that you could take a look at?
Sure!
Hi all,
Coming back to this now. Does anyone have some input data to reproduce this error? It's great that we have the MultiQC outputs attached, but ideally I want to be able to run MultiQC myself and observe the error. Then I can work on fixing it.
The example report seems to have been generated with MultiQC v1.10.1 and a lot has changed since then..
Phil
Description of bug
general_stats_table.csv PT_aDNA_1_multiqc_report.html.zip I have attached the html report and the exported general stats table in a comma-separated format. All the samples are in the csv file, but some of them do not have any value in some columns and I know that these values have been calculated because I have seen them in the beeswarm plot. For instance, sample 22219 does not have any information in the Endogenous DNA (%) column and the value is plotted in the beeswarm. Why do data miss in the exported file?
File that triggers the error
No response
MultiQC Error log
No response