chollenbeck / rad_haplotyper

MIT License
7 stars 5 forks source link

Feature request: new format #24

Open zacforsman opened 6 years ago

zacforsman commented 6 years ago

How difficult would it be to output sequences in a fasta multiple sequence alignment of all sequences concatenated together? IMA format is close but it's a bit of a pain to go from ima to a fasta alignment of all loci... This format would be useful for further conversion to a wide variety of other programs (like pgd spider), enabling a wide variety of downstream analysis. This feature would be awesome! Thanks. -Zac

cbird808 commented 6 years ago

Pgd spider should be able to convert the ima 2 file from rad haplotyper

Get Outlook for Androidhttps://aka.ms/ghei36


From: Zac Forsman notifications@github.com Sent: Thursday, September 7, 2017 2:17:05 PM To: chollenbeck/rad_haplotyper Cc: Subscribed Subject: [chollenbeck/rad_haplotyper] Feature request: new format (#24)

How difficult would it be to output sequences in a fasta multiple sequence alignment of all sequences concatenated together? IMA format is close but it's a bit of a pain to go from ima to a fasta alignment of all loci... This format would be useful for further conversion to a wide variety of other programs (like pgd spider), enabling a wide variety of downstream analysis. This feature would be awesome! Thanks. -Zac

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/chollenbeck/rad_haplotyper/issues/24, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMNeS1odcJu-ngicMtTte8YRtiDKK9V8ks5sgEEwgaJpZM4PQRRS.

zacforsman commented 6 years ago

Thanks l but I've not been able to get this to work.

I've tried various settings in PGDSpider but I keep getting input file errors (although when I check the line of the file nothing looks unusual). I also get a java.lang.Arrayindexoutofboundsexception: 4 error.

If anyone has been able to do this let me know, if not I'll keep trying. -Zac


Zac H. Forsman, Ph.D. Researcher, Hawaii Institute of Marine Biology 46-007 Lilipuna Rd, Kaneohe, HI 96744 (Fedex or deliveries) Google Scholar profile: https://scholar.google.com/citations?hl=en&user=MyhFvt4AAAAJ

On Thu, Sep 7, 2017 at 2:28 PM, cbird808 notifications@github.com wrote:

Pgd spider should be able to convert the ima 2 file from rad haplotyper

Get Outlook for Androidhttps://aka.ms/ghei36


From: Zac Forsman notifications@github.com Sent: Thursday, September 7, 2017 2:17:05 PM To: chollenbeck/rad_haplotyper Cc: Subscribed Subject: [chollenbeck/rad_haplotyper] Feature request: new format (#24)

How difficult would it be to output sequences in a fasta multiple sequence alignment of all sequences concatenated together? IMA format is close but it's a bit of a pain to go from ima to a fasta alignment of all loci... This format would be useful for further conversion to a wide variety of other programs (like pgd spider), enabling a wide variety of downstream analysis. This feature would be awesome! Thanks. -Zac

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ chollenbeck/rad_haplotyper/issues/24, or mute the thread< https://github.com/notifications/unsubscribe-auth/AMNeS1odcJu- ngicMtTte8YRtiDKK9V8ks5sgEEwgaJpZM4PQRRS>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chollenbeck/rad_haplotyper/issues/24#issuecomment-327966298, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaBbG2HKffvxuyxHADJhW05XNRKTh7dks5sgIpAgaJpZM4PQRRS .

zacforsman commented 6 years ago

Any ideas on how to get PGDSpider to work with RAD_Haplotyper? If you can try it with some test data I would appreciate it. Thanks! -Zac

chollenbeck commented 6 years ago

Hi Zac,

Could you post a minimal example along with the specific errors that you are getting?

Chris

zacforsman commented 6 years ago

Thanks Chris, Here is some more information..

The rad haplotype IMA2 fasta file is 38 MB, here is the 'head' of the file looks like this:

IMa Test 23 AAP ABU ADE AFUL AFUS ALI AMU ASO AU AUN NCU PE PHE PMI PPE PPH PPR PRE PSE PTER PTES PVA TIR (0,1):2 1732 dDocent_Contig_43395 2 2 2 26 4 2 2 2 4 4 6 2 2 2 311 I 1 AFUL_2A NGATCCTATACCCGACAAGCCTACGGCCGAGGATGGTAATAGCTCCACCTCAAACCTGAGCCCTAACACTTAACAGACTAACAACAACAAACCCAATCTGAGACGAGTGCGTGTCTTGTGTCATTGTAATAATTGTGATTCAATTAGAAAGAAAATTAAAATGTAAAAAAATTATAATAAAAGTTTTTTTGAAATTTTTTTGGTTTGTGTTTTTTTTATTTATTATTTTATGTATATGTTTTCTCACCCTGGTCGTTTTAAAAGAATTCACTCTGTAACCTGTAGTTACTAGAATACTAAGAAAGAGATCN AFUL_2B NGATCCTATACCCGACAAGCCTACGGCCGAGGATGGTAATAGCTCCACCTCAAACTTGAGCCCTAACACTTAACAGACTAACAACAACAAACCCAATCTGAGACGAGTGCGTGTCTTGTGTCATTGTAATAATTGTGATTCAATTAGAAAGAAAATTAAAATGTAAAAAAATTATAATAAAAGTTTTTTTGAAATTTTTTTGGTTTGTGTTTTTTTTATTTATTATTTTATGTATATGTTGTCTCACCCTGGTCGTTTTAAAAGAGTTCACTCTGTAACCTGTAGTTACTAGAATACTAAGAAAGAGATCN AFUS_1A NGATCCTATACCCGACAAGCCTACGGCCGAGGATGGTAATAGCTCCACCTCAAACCTGAGCCCTAACACTTAACAGACTAACAACAACAAACCCAATCTGAGACGAGTGCGTGTCTTGTGTCATTGTAATAATTGTGATTCAATTAGAAAGAAAATTAAAATGTAAAAAAATTATAATAAAAGTTTTTTTGAAATTTTTTTGGTTTGTGTTTTTTTTATTTATTATTTTATGTATATGTTTTCTCACCCTGGTCGTTTTAAAAGAATTCACTCTGTAACCTGTAGTTACTAGAATACTAAGAAAGAGATCN AFUS_1B NGATCCTATACCCGACAAGCCTACGGCCGAGGATGGTAATAGCTCCACCTCAAACCTGAGCCCTAACACTTAACAGACTAACAACAACAAACCCAATCTGAGACGAGTGCGTGTCTTGTGTCATTGTAATAATTGTGATTCAATTAGAAAGAAAATTAAAATGTAAAAAAATTATAATAAAAGTTTTTTTGAAATTTTTTTGGTTTGTGTTTTTTTTATTTATTATTTTATGTATATGTTGTCTCACCCTGGTCGTTTTAAAAGAGTTCACTCTGTAACCTGTAGTTACTAGAATACTAAGAAAGAGATCN

With PGDspider I have tried many different options but I keep getting: Parse Error Unable to parse input data:

INFO 15:40:35 - convert IMa2:radhaplotypesall.fasta to PHYLIP (RAxML):test.txt java.lang.ArrayIndexOutOfBoundsException: 25 at ch.unibe.iee.cmpg.pgdspider.model.parser.Ima2Parser.parse(Ima2Parser.java:186) at ch.unibe.iee.cmpg.pgdspider.gui.tasks.ConverterThread.run(ConverterThread.java:63) at java.lang.Thread.run(Thread.java:745) ERROR 15:40:36 - input file error at line: 6

I get the same error if trying to convert to Nexus, phylip, fasta from IMA2 format.... When I try IMa format, I get an error at line; 2

I've tried multiple settings in the SPID file and it doesn't seem to matter, I keep getting the same errors...Any suggestions would be greatly appreciated.

-Zac


Zac H. Forsman, Ph.D. Researcher, Hawaii Institute of Marine Biology 46-007 Lilipuna Rd, Kaneohe, HI 96744 (Fedex or deliveries) Google Scholar profile: https://scholar.google.com/citations?hl=en&user=MyhFvt4AAAAJ

On Tue, Oct 3, 2017 at 9:45 PM, chollenbeck notifications@github.com wrote:

Hi Zac,

Could you post a minimal example along with the specific errors that you are getting?

Chris

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chollenbeck/rad_haplotyper/issues/24#issuecomment-334075327, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaBbKitxXosPFaW5XNLL6bvGEU42dljks5sozeHgaJpZM4PQRRS .

chollenbeck commented 6 years ago

It looks like you have some small sample sizes. Are you sure that all of your populations actually have genotyped individuals at each locus? If all of the individuals in a population are missing data at that locus, the format string for the locus (line 6 for the first locus) will not be specified correctly, and the PGDSpider will throw an error.

Josh-Copus commented 6 years ago

Hi Zac,

The correct format for the IMA file is to include information for all of your populations on line 6. When you have missing data, there still should be zeros to represent the missing data. This is how the data are associated with the populations, so make sure everything is in the correct order. In your example, line 6 should look more like this: dDocent_Contig_43395 0 0 0 2 2 2 26 0 4 0 2 2 0 2 0 4 4 0 6 2 0 0 2 0 0 2 311 I 1

Unfortunately radhaplotyper does not format it this way. This program will only format the IMA file correctly if there is at least one individual represented for each population at every locus. Josh

zacforsman commented 6 years ago

Missing loci is not necessarily a problem, for example... see attached example of an output file from pyRAD..

If we could get dDocent to output this format, it would be useful for phylogenomic-level comparisons.

-Zac

example_desired_output.nex.txt

jpuritz commented 6 years ago

That's nexus format. I think you should be able to take the Ima output and covert to nexus with PGD spider.

stuartwillis commented 5 years ago

I can confirm that the missing localities causes PGDspider to throw an error.

Assuming the IMa file is formatted properly, I have a bash script that will extract each locus (or selected loci) into PHYLIP files, since it has to be done once at a time. It's very slow, though. Probably just needs to be written as a Perl script and bypass PGDspider altogether.