faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

sample name issue when mapping against one ref with phyluce_snp_bwa_multiple_align #259

Open leonvarhan opened 2 years ago

leonvarhan commented 2 years ago

Hello Dr. Faircloth, I used phyluce_snp_bwa_multiple_align for mapping reads against the contig reference of my best sample instead of individual references. Then I ran phyluce_snp_phase_uces and phyluce_align_seqcap_align, and so now my UCE alignments do not have the name of individual samples just the one sample I used as reference. For example: 'uce-123_ref_sample_0' CCAGAAAAATAGAAGGGATCTTTTTCCTGAT... 'uce-123_ref_sample_1' CCAGAAAAATAGAAATGATTCTTTTTTCCTGAT... 'uce-123_ref_sample_0.copy' ATTGAAAAAATGCCCTAGAAATCTCCTGATCAAAGATCC... 'uce-123_ref_sample_1.copy' ATTGAAAAATAGAAGTAAACCATCTCCTGATCAAAGATCC... 'uce-123_ref_sample_0.copy1' AGAAAAATAGAAATCTCCTGATCAAAAAAAGATCC... 'uce-123_ref_sample_1.copy1' AGAAAAATAGAAATCTCCTGATCAAAAAAAAGATCC...'

After I ran phyluce_snp_phase_uces, the files in my -bams-phased-reads folder have correct sample names, but if I open a .clean.balanced.fasta file, for example, I only see the name of the one sample I used as reference.

Is there a way I could map reads to one reference and keep the individual names after phasing? I was thinking I could make copies of the .fasta file of my ref_sample and give them individual names for my mapping.conf file. But I am wondering if there might be a better way to do this. Thank you so much for your time!!! Leo

brantfaircloth commented 2 years ago

Hi Leo,

Your best bet would be just to take the approach that you describe (you might also be able to use symlinks and not physical copies if storage space is an issue). Alternatively, since you already have the output files, you could just rename the fasta headers in each file using sed or similar (e.g. to replace ref with the correct sample name).

-b