caravagnalab / rRACES

R wrapper for the RACES package
GNU General Public License v3.0
2 stars 1 forks source link

Add SM tag in RG filed of simulated SAM files #91

Closed giorgiagandolfi closed 8 months ago

giorgiagandolfi commented 8 months ago

I am testing sarek variant calling staring from sam files generated by both simulate_seq and simulate_normal_seq. I would suggest to add to the @RG field in sam header the SM tag, since most steps of the analysis require it. In general the SM tag refers to the sample name, which is unique. Here an example:

@HD   VN:1.6 SO:unknown
@SQ   SN:22  LN:51304566   AN:chromosome22,chr22,chromosome_22,chr_22
@RG   ID:S1_1   SM:S1
@RG   ID:S1_2   SM:S1
albertocasagrande commented 8 months ago

In the header produced by RACES, IDs are the sample names. So, if this matches your request, we can produce a header of the kind

@HD   VN:1.6 SO:unknown
@SQ   SN:22  LN:51304566   AN:chromosome22,chr22,chromosome_22,chr_22
@RG   ID:S1_1   SM:S1_1
@RG   ID:S1_2   SM:S1_2
giorgiagandolfi commented 8 months ago

Yes, sorry you're right. So if I would like to have a sam specific for each sample, I will split the sam. Thanks

albertocasagrande commented 8 months ago

You can already split the SAM as described in the documentation.

In any case, do you think I should add the SM tag as suggested above?