Roleren / ORFik

MIT License
32 stars 9 forks source link

STATS.csv has percentage_aligned_raw greater than 100% #163

Closed SuhasSrinivasan closed 7 months ago

SuhasSrinivasan commented 8 months ago

Hello again,

In QC_STATS, the STATS.csv has the summary statistics for trimming and alignment.

But it is not clear how the percentage_aligned_raw is computed and why it is greater than 100%.

When trying to match these values, it does not match the statistics in STAR's Log.final.out files.

Attached are examples of this. Please let me know if additional logs are needed. Thank you!

STATS.csv

Cardio_RFP_1_Log.final.txt

Roleren commented 8 months ago

Raw are reads, and aligned are alignments. Since STAR allows multimappers, you can have more than 100%. For mapped reads see the final csv file for STAR located outside the aligned folder. Any more questions on that ? :)

SuhasSrinivasan commented 8 months ago

Thank you for quickly reviewing this!

Since STAR allows multimappers, you can have more than 100%

The Log.final.out contains this

% of reads mapped to multiple loci |    54.01%

So still not sure how percentage_aligned_raw is 280.2114

see the final csv file for STAR located outside the aligned folder

Thank you, found the 00_STAR_LOG plot and CSV. But the greater than 100% value is not found here :)

Roleren commented 8 months ago

Take a look at the file located at /aligned/../full_process.csv (outside the aligned folder relative to the experiment in processed_data).

Here is how you get 280%

70% of raw reads map to genome. You have on average 4 multimappers. 70% mapped reads*4 multimappers = 280% alignments relative to raw reads

Roleren commented 7 months ago

Reopen if there is else