genophenoenvo / terraref-datasets

Repository for code and small datasets derived from the TERRA REF program
MIT License
0 stars 3 forks source link

Generate evolutionary distance for all genomic data #75

Closed kshefchek closed 4 years ago

kshefchek commented 4 years ago

As a first pass will use tassel - https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/DistanceMatrix/DistanceMatrix but there many ways to go about this (see also https://www.cog-genomics.org/plink/2.0/distance)

cc @rossarun

kshefchek commented 4 years ago

The results for this can be found here: https://data.monarchinitiative.org/genophenoenvo/tassel5/all_cultivars_distance.txt

@rbartelme is there any way to sanity check (cultivars we expect to be more distant or more closely related?)

rbartelme commented 4 years ago

@kshefchek I think the BAP paper may offer a a little insight. How does this look with just the ones across all seasons? I think that's arguably more important?

kshefchek commented 4 years ago

distance matrices for all_seasons and all cultivars with genomic data are here https://data.monarchinitiative.org/genophenoenvo/tassel5/distance/

jaiswalp commented 4 years ago

Kent,

I managed to reformat the distance matrix file into a *.meg file. This was used for importing into Muscle and computing a UPGMA and a Neighbor-joining tree with default parameters.

See the UPGMA tree in PDF format as well.

Files are available at the following place including your original distance matrix.

https://www.dropbox.com/sh/1j6lepl8jripx8y/AADZQhYM6T1ALbd4mYoc1Vq1a?dl=0

If I reach the Kent's matrix file it says only 3 alleles were used out of the 1.19M sites. Did tassel generate an alignment?

Thanks Pankaj

On 6/15/2020 3:24 PM, Kent Shefchek wrote:

The results for this can be found here: https://data.monarchinitiative.org/genophenoenvo/tassel5/all_cultivars_distance.txt

@rbartelme https://github.com/rbartelme is there any way to sanity check (cultivars we expect to be more distant or more closely related?)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/genophenoenvo/terraref-datasets/issues/75#issuecomment-644421006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYJEQEVJ6TN46MADBNZ2S3RW2NSJANCNFSM4N6ZTKRQ.

-- Pankaj Jaiswal, PhD Professor Dept. of Botany and Plant Pathology 2082 Cordley Hall Oregon State University Corvallis, OR, 97331 USA Ph.: +1-541-737-8471 Fax: +1-541-737-3573 email: jaiswalp@oregonstate.edu Web: http://jaiswallab.cgrb.oregonstate.edu

jaiswalp commented 4 years ago

The tree I have looks different from yours though

https://data.monarchinitiative.org/genophenoenvo/tassel5/distance/all_cultivars.tree.txt

Pankaj

On 6/16/2020 11:30 AM, Pankaj Jaiswal wrote:

Kent,

I managed to reformat the distance matrix file into a *.meg file. This was used for importing into Muscle and computing a UPGMA and a Neighbor-joining tree with default parameters.

See the UPGMA tree in PDF format as well.

Files are available at the following place including your original distance matrix.

https://www.dropbox.com/sh/1j6lepl8jripx8y/AADZQhYM6T1ALbd4mYoc1Vq1a?dl=0

If I reach the Kent's matrix file it says only 3 alleles were used out of the 1.19M sites. Did tassel generate an alignment?

Thanks Pankaj

On 6/15/2020 3:24 PM, Kent Shefchek wrote:

The results for this can be found here: https://data.monarchinitiative.org/genophenoenvo/tassel5/all_cultivars_distance.txt

@rbartelme https://github.com/rbartelme is there any way to sanity check (cultivars we expect to be more distant or more closely related?)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/genophenoenvo/terraref-datasets/issues/75#issuecomment-644421006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYJEQEVJ6TN46MADBNZ2S3RW2NSJANCNFSM4N6ZTKRQ.

-- Pankaj Jaiswal, PhD Professor Dept. of Botany and Plant Pathology 2082 Cordley Hall Oregon State University Corvallis, OR, 97331 USA Ph.: +1-541-737-8471 Fax: +1-541-737-3573 email:jaiswalp@oregonstate.edu Web:http://jaiswallab.cgrb.oregonstate.edu

-- Pankaj Jaiswal, PhD Professor Dept. of Botany and Plant Pathology 2082 Cordley Hall Oregon State University Corvallis, OR, 97331 USA Ph.: +1-541-737-8471 Fax: +1-541-737-3573 email: jaiswalp@oregonstate.edu Web: http://jaiswallab.cgrb.oregonstate.edu

kshefchek commented 4 years ago

I can't find any documentation on what IBS_Distance_Matrix.NumAlleles means, although looking at the source code it seems this value can only be 2 or 3 - so perhaps diploid and triploid genomes? Although this wouldn't make sense since Sorghum bicolor is diploid.

jaiswalp commented 4 years ago

Can you get the tree in both the UPGMA and NJ methods and exprot them as newick format.

https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Cladogram/Cladogram

You may want to remove the NaN's from the distance matrix. This didn't allow muscle to run on the distance matrix. I had to replace this value with 0.000.

https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/RemoveNaN/RemoveNaN

More on IBS here

https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/DistanceMatrix/DistanceMatrix

Also Terry is the lead developer for Tassel. Reach out to him (tmc46@cornell.edu mailto:tmc46@cornell.edu) if you have questions. https://www.maizegenetics.net/terrycasstevens

Pankaj

On 6/16/2020 12:28 PM, Kent Shefchek wrote:

I can't find any documentation on what IBS_Distance_Matrix.NumAlleles means, although looking at the source code https://bitbucket.org/search?q=repo%3Atassel-5-source%20IBS_DISTANCE_MATRIX_NUM_ALLELES&account=%7B2600110d-1c94-4e03-8eed-63d4af30b7d7%7D it seems this value can only be 2 or 3 - so perhaps diploid and triploid genomes? Although this wouldn't make sense since Sorghum bicolor is diploid.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/genophenoenvo/terraref-datasets/issues/75#issuecomment-644967890, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYJEQCSZRUINTZ2FUGEEMLRW7BWPANCNFSM4N6ZTKRQ.

-- Pankaj Jaiswal, PhD Professor Dept. of Botany and Plant Pathology 2082 Cordley Hall Oregon State University Corvallis, OR, 97331 USA Ph.: +1-541-737-8471 Fax: +1-541-737-3573 email: jaiswalp@oregonstate.edu Web: http://jaiswallab.cgrb.oregonstate.edu

kshefchek commented 4 years ago

sure thing - those three files are now in https://data.monarchinitiative.org/genophenoenvo/tassel5/distance/