alexdobin / STAR

RNA-seq aligner
MIT License
1.78k stars 497 forks source link

No RG tag when mapping using manifest #1254

Open J-Moravec opened 3 years ago

J-Moravec commented 3 years ago

When mapping multiple fastq using --readFilesIn together with --outSAMattrRGline, the RG tag is added to the SAM header and all reads.

However, when mapping with manifest using --readFilesManifest, the RG tag is added ONLY to the SAM header, but not to reads! This makes it impossible to distinguish where the reads came from.

This can be fixed by adding the --outSAMattributes RG, but it is confusing, unexpected and undocumented behaviour.

Suggestion: If the third column in the file manifest is provided, just add it to reads. This will make it consistent behaviour regardless if files are added by --readFilesIn or through readFilesManifest

This issue was reported earlier and marked as solved as --outSAMAttributes RG does fix the problem. But what is the point of the RG header if reads are not tagged?

https://github.com/alexdobin/STAR/issues/1089 https://github.com/alexdobin/STAR/issues/1145

(btw STAR & STARsolo rocks!)

alexdobin commented 3 years ago

Hi Jiří

Thanks for the feedback! I agree that documentation on this behavior is lacking, but I think the behavior itself makes sense. The output of SAM attributes is controlled by the --outSAMattributes, so it has to be specified there. And you are right, if RG is not requested, it should not go to the SAM header. I would rather not make the behavior depend on the formatting of the input files.

Cheers Alex

J-Moravec commented 3 years ago

Hi Alex, thanks for the response.

What do you think about the inconsistency between --readFilesIn and --readFilesManifest regarding the RG tag in reads? Do you consider --outSAMattrRGline as a specification of --outSAMattributes?

alexdobin commented 3 years ago

Hi Jiří

every rule has an exception. :) Also, with --outSAMattrRGline we specify explicitly that we want RG tags.

But you are right, it seems to be confusing to a number of people, so I will add it to my TODO list to change the behavior.

Cheers Alex

olekskrav commented 1 year ago

Hi Alex,

I just came here with the same problem and fortunately I found this explanation before opening a new issue.

It would be indeed great to have an extra sentence in the STAR manual's Section 3.2 "Mapping multiple files in one run" saying that we should explicitly add "--outSAMattributes RG" if we want the read group ids to be outputted in each read when using "--readFilesManifest" option. The current wording rather suggests that "--readFilesIn ... --outSAMattrRGline ..." combination is equivalent to "--readFilesManifest ...", but as you explained it is not true. Updating the manual might actually solve the issue.

In any case, thanks for the helpful responses to all the issues!

Hi Jiří

every rule has an exception. :) Also, with --outSAMattrRGline we specify explicitly that we want RG tags.

But you are right, it seems to be confusing to a number of people, so I will add it to my TODO list to change the behavior.

Cheers Alex