Closed JimStarrett closed 5 years ago
Is it possible that you have Windows (or "classic mac") line endings? That could account for things. Because it works with the following.
"test.fa":
>TaxonA
AAATTTCCCTGTCCCTTTAA
>TaxonB
GCTCGAGGGGCCCCAAGACC
>TaxonC
ACGCTCCCCCTTAAAAATGA
>TaxonD
TCCTTGTTCAACTCCGGTGG
>TaxonE
TTACTATTCCCCCCCGCCGG
>I21560_AUMS19499_Araneae_Lycosidae_Schizocosa_ocreata_seq1
AAATTTCCCTGTCCCTTTAA
>I21567_AUMS19519_Araneae_Lycosidae_Schizocosa_uetzi_rovneri_seq1
AAATTTCCCTGTCCCTTTAA
>I21570_AUMS19434_Araneae_Lycosidae_Schizocosa_ocreata_seq1
AAATTTCCCTGTCCCTTTAA
>I21563_AUMS19507_Araneae_Lycosidae_Schizocosa_retrorsa_seq1
AAATTTCCCTGTCCCTTTAA
"names.txt":
I21560_AUMS19499_Araneae_Lycosidae_Schizocosa_ocreata_seq1
I21567_AUMS19519_Araneae_Lycosidae_Schizocosa_uetzi_rovneri_seq1
I21570_AUMS19434_Araneae_Lycosidae_Schizocosa_ocreata_seq1
I21563_AUMS19507_Araneae_Lycosidae_Schizocosa_retrorsa_seq1
Now, run:
$ pxrms -s test.fa -f name.txt
>TaxonA
AAATTTCCCTGTCCCTTTAA
>TaxonB
GCTCGAGGGGCCCCAAGACC
>TaxonC
ACGCTCCCCCTTAAAAATGA
>TaxonD
TCCTTGTTCAACTCCGGTGG
>TaxonE
TTACTATTCCCCCCCGCCGG
And when I run it with -c
I get:
$ pxrms -f name.txt -s test.fa -c
>I21560_AUMS19499_Araneae_Lycosidae_Schizocosa_ocreata_seq1
AAATTTCCCTGTCCCTTTAA
>I21567_AUMS19519_Araneae_Lycosidae_Schizocosa_uetzi_rovneri_seq1
AAATTTCCCTGTCCCTTTAA
>I21570_AUMS19434_Araneae_Lycosidae_Schizocosa_ocreata_seq1
AAATTTCCCTGTCCCTTTAA
>I21563_AUMS19507_Araneae_Lycosidae_Schizocosa_retrorsa_seq1
AAATTTCCCTGTCCCTTTAA
Maybe copy-paste these examples and make sure they work for you?
Yep, that sort of seems to have been the issue. I was able to replicate your example. My name.txt file did have Unix line endings, but when I changed the file type of my name.txt to Text File in TextWrangler it worked properly. Thanks for your help!
Glad we could figure this (one) out! If you are dealing with Windows files, you can do:
dos2unix FILES
If "classic mac" (which is more likely if you are using TextWrangler), do:
dos2unix -c mac FILES
Because I come across this ever so often, I have the following alias in my .bashrc
:
alias mac2unix='dos2unix -c mac'
so then if I come across a mac file I can just:
mac2dos FILE
HTH. And I wonder if this is involved in your other issue (#95)?
Thank you for those conversion commands! I'll take a look at my tree files to see if that is the issue for #95 .
Now that I have the pxrms command working for one file I am trying to implement this in a shell script with a loop so I can do this for about 500 alignments. Do you have any suggestions for a 'for loop'?
For example I have align_1.fasta align_2.fasta align_3.fasta etc.
and want align_1_reduced_taxa.fasta align_2_reduced_taxa.fasta align_3_reduced_taxa.fasta until the last file.
If you want to process all of the fasta files in a directory (i.e. remove the same set of taxa from all) you can do:
for x in *.fasta; do pxrms -f name.txt -s $x -c -o $x\_reduced.fa; done
This will generate files of the name align_*.fasta_reduced.fa
. There are cleverer ways to get the exact output name you specify above, but I would have to think about it.
Ok, you can do this:
for x in *.fasta; do pxrms -f name.txt -s $x -c -o ${x%.}_reduced.fasta; done
Hello again
I am running pxrms but the outfile I am getting is an exact copy of the infile. So, no individuals are being pruned out. I would actually like to get the complement, but when I run it with the -c flag it returns an empty file. Not sure what is going on, but here is my command and an example of my name list. I also tried with the comma separated list but no success.
pxrms -f name.txt -s T396_L1.fasta -o out5.fa
The format of my name list is like so: