ash-dieback-crowdsource / data

Repository for crowd-sourced data from genomics analysis of the UK ash dieback (Chalara fraxinea) outbreak 2012
22 stars 11 forks source link

Nornex Tree35 assembled by TGAC #5

Closed bjclavijo closed 11 years ago

bjclavijo commented 11 years ago

This pull contains the Nornex preliminary first pass draft assembly from TGAC for the Tree35. Read files to be uploaded on the FTP and paths updated.

RichardBuggs commented 11 years ago

Thanks for this, Bernardo! Do you have assembly stats for this (N50, total size etc) please? I am away from my desk at the moment. Was this based on 4 Miseq runs?

best wishes,

Richard

On 14 May 2013, at 01:46, Bernardo Clavijo wrote:

This pull contains the Nornex preliminary first pass draft assembly from TGAC for the Tree35. Read files to be uploaded on the FTP and paths updated.

You can merge this Pull Request by running

git pull https://github.com/bjclavijo/data master Or view, comment on, or merge it at:

https://github.com/ash-dieback-crowdsource/data/pull/5

Commit Summary

Added Nornex tree35 assembly by TGAC File Changes

A ash_dieback/fraxinus_excelsior/tree35/assemblies/gDNA/Fraxinus_excelsior_Nornex_s1v1/Fraxinus_excelsior_Nornex_s1v1.tar.gz (0) A ash_dieback/fraxinus_excelsior/tree35/assemblies/gDNA/Fraxinus_excelsior_Nornex_s1v1/assembly.info (22) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_1/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_2/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_3/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_4/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/strain.info (34) Patch Links:

https://github.com/ash-dieback-crowdsource/data/pull/5.patch https://github.com/ash-dieback-crowdsource/data/pull/5.diff


Dr Richard Buggs | Senior Lecturer | School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, United Kingdom | email: r.buggs@qmul.ac.uk | website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html | office: +44(0)207 882 3058 | mobile: +44(0)772 992 0401 | twitter: @RJABuggs

bjclavijo commented 11 years ago

Yes it is based on the 4 runs available on the FTP, 2 are MiSeq and 2 are HiSeq.

Stats as from abyss-fac are:

I am aware of duplication/ incorrect copy numbers on the assembly due to the heterozuygosity, but importantly I think most of the unique content is assembled to a relatively good standard. That means that if a gene IS on the genome, it will be assembled, but might appear more times than it should.

This is a starting point, I will obviously keep working on this, but this really was a first-pass assembly that I do as part of the data analysis on the runs more than nothing. Little to no tweaking. We releasing it because we think it's good enough for a lot of analysis and our computing platform allowed us to do it quickly while some other people might not have the resources.

If you have specific concerns or questions (or anyone else has) please write me to my tgac email and we can work it out together.

Cheers,

bj

On 15 May 2013, at 12:09, RichardBuggs wrote:

Thanks for this, Bernardo! Do you have assembly stats for this (N50, total size etc) please? I am away from my desk at the moment. Was this based on 4 Miseq runs?

best wishes,

Richard

On 14 May 2013, at 01:46, Bernardo Clavijo wrote:

This pull contains the Nornex preliminary first pass draft assembly from TGAC for the Tree35. Read files to be uploaded on the FTP and paths updated.

You can merge this Pull Request by running

git pull https://github.com/bjclavijo/data master Or view, comment on, or merge it at:

https://github.com/ash-dieback-crowdsource/data/pull/5

Commit Summary

Added Nornex tree35 assembly by TGAC File Changes

A ash_dieback/fraxinus_excelsior/tree35/assemblies/gDNA/Fraxinus_excelsior_Nornex_s1v1/Fraxinus_excelsior_Nornex_s1v1.tar.gz (0) A ash_dieback/fraxinus_excelsior/tree35/assemblies/gDNA/Fraxinus_excelsior_Nornex_s1v1/assembly.info (22) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_1/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_2/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_3/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_4/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/strain.info (34) Patch Links:

https://github.com/ash-dieback-crowdsource/data/pull/5.patch https://github.com/ash-dieback-crowdsource/data/pull/5.diff


Dr Richard Buggs | Senior Lecturer | School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, United Kingdom | email: r.buggs@qmul.ac.uk | website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html | office: +44(0)207 882 3058 | mobile: +44(0)772 992 0401 | twitter: @RJABuggs — Reply to this email directly or view it on GitHub.

RichardBuggs commented 11 years ago

Hi Bernado,

We feel pretty cautious about our 454 assembly so far, but like you felt it was worth releasing asap.

I think perhaps you forgot to paste the stats into your email (see below)...

Have you run the assembly through the CEGMA pipeline to see how many core eukaryote genes you hit?

best wishes

Richard

On 15 May 2013, at 13:27, Bernardo Clavijo wrote:

Yes it is based on the 4 runs available on the FTP, 2 are MiSeq and 2 are HiSeq.

Stats as from abyss-fac are:

I am aware of duplication/ incorrect copy numbers on the assembly due to the heterozuygosity, but importantly I think most of the unique content is assembled to a relatively good standard. That means that if a gene IS on the genome, it will be assembled, but might appear more times than it should.

This is a starting point, I will obviously keep working on this, but this really was a first-pass assembly that I do as part of the data analysis on the runs more than nothing. Little to no tweaking. We releasing it because we think it's good enough for a lot of analysis and our computing platform allowed us to do it quickly while some other people might not have the resources.

If you have specific concerns or questions (or anyone else has) please write me to my tgac email and we can work it out together.

Cheers,

bj

On 15 May 2013, at 12:09, RichardBuggs wrote:

Thanks for this, Bernardo! Do you have assembly stats for this (N50, total size etc) please? I am away from my desk at the moment. Was this based on 4 Miseq runs?

best wishes,

Richard

On 14 May 2013, at 01:46, Bernardo Clavijo wrote:

This pull contains the Nornex preliminary first pass draft assembly from TGAC for the Tree35. Read files to be uploaded on the FTP and paths updated.

You can merge this Pull Request by running

git pull https://github.com/bjclavijo/data master Or view, comment on, or merge it at:

https://github.com/ash-dieback-crowdsource/data/pull/5

Commit Summary

Added Nornex tree35 assembly by TGAC File Changes

A ash_dieback/fraxinus_excelsior/tree35/assemblies/gDNA/Fraxinus_excelsior_Nornex_s1v1/Fraxinus_excelsior_Nornex_s1v1.tar.gz (0) A ash_dieback/fraxinus_excelsior/tree35/assemblies/gDNA/Fraxinus_excelsior_Nornex_s1v1/assembly.info (22) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_1/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_2/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_3/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_4/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/strain.info (34) Patch Links:

https://github.com/ash-dieback-crowdsource/data/pull/5.patch https://github.com/ash-dieback-crowdsource/data/pull/5.diff


Dr Richard Buggs | Senior Lecturer | School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, United Kingdom | email: r.buggs@qmul.ac.uk | website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html | office: +44(0)207 882 3058 | mobile: +44(0)772 992 0401 | twitter: @RJABuggs — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.


Dr Richard Buggs | Senior Lecturer | School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, United Kingdom | email: r.buggs@qmul.ac.uk | website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html | office: +44(0)207 882 3058 | mobile: +44(0)772 992 0401 | twitter: @RJABuggs

bjclavijo commented 11 years ago

Hi Richard, sorry, stats where attached as image, i'm pasting on txt here, hope the fonts don't mess up the displaying of them:

409503  387267  62063   200 2801    5911    11876   123139  1.282e9 Fraxinus_excelsior_Nornex_s1v1-contigs.fa
249580  249580  48860   392 3526    7138    15616   315237  1.285e9 Fraxinus_excelsior_Nornex_s1v1-scaffolds.fa

Haven't got time to put it through CEGMA yet, releasing soon was priority. But I'll do (or somebody else... this is crowdsourcing after all :P ). Having said that, I think on gene presence we'll be ok, but will have n-plications (typical heterozygous assembly results, and as you see the total bp is higher tha genome size, so expected).

Cheers,

bj

On 15 May 2013, at 15:25, RichardBuggs wrote:

Hi Bernado,

We feel pretty cautious about our 454 assembly so far, but like you felt it was worth releasing asap.

I think perhaps you forgot to paste the stats into your email (see below)...

Have you run the assembly through the CEGMA pipeline to see how many core eukaryote genes you hit?

best wishes

Richard

On 15 May 2013, at 13:27, Bernardo Clavijo wrote:

Yes it is based on the 4 runs available on the FTP, 2 are MiSeq and 2 are HiSeq.

Stats as from abyss-fac are:

I am aware of duplication/ incorrect copy numbers on the assembly due to the heterozuygosity, but importantly I think most of the unique content is assembled to a relatively good standard. That means that if a gene IS on the genome, it will be assembled, but might appear more times than it should.

This is a starting point, I will obviously keep working on this, but this really was a first-pass assembly that I do as part of the data analysis on the runs more than nothing. Little to no tweaking. We releasing it because we think it's good enough for a lot of analysis and our computing platform allowed us to do it quickly while some other people might not have the resources.

If you have specific concerns or questions (or anyone else has) please write me to my tgac email and we can work it out together.

Cheers,

bj

On 15 May 2013, at 12:09, RichardBuggs wrote:

Thanks for this, Bernardo! Do you have assembly stats for this (N50, total size etc) please? I am away from my desk at the moment. Was this based on 4 Miseq runs?

best wishes,

Richard

On 14 May 2013, at 01:46, Bernardo Clavijo wrote:

This pull contains the Nornex preliminary first pass draft assembly from TGAC for the Tree35. Read files to be uploaded on the FTP and paths updated.

You can merge this Pull Request by running

git pull https://github.com/bjclavijo/data master Or view, comment on, or merge it at:

https://github.com/ash-dieback-crowdsource/data/pull/5

Commit Summary

Added Nornex tree35 assembly by TGAC File Changes

A ash_dieback/fraxinus_excelsior/tree35/assemblies/gDNA/Fraxinus_excelsior_Nornex_s1v1/Fraxinus_excelsior_Nornex_s1v1.tar.gz (0) A ash_dieback/fraxinus_excelsior/tree35/assemblies/gDNA/Fraxinus_excelsior_Nornex_s1v1/assembly.info (22) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_1/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_2/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_3/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/reads/gDNA/read_set_4/read_set.info (21) A ash_dieback/fraxinus_excelsior/tree35/strain.info (34) Patch Links:

https://github.com/ash-dieback-crowdsource/data/pull/5.patch https://github.com/ash-dieback-crowdsource/data/pull/5.diff


Dr Richard Buggs | Senior Lecturer | School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, United Kingdom | email: r.buggs@qmul.ac.uk | website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html | office: +44(0)207 882 3058 | mobile: +44(0)772 992 0401 | twitter: @RJABuggs — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.


Dr Richard Buggs | Senior Lecturer | School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, United Kingdom | email: r.buggs@qmul.ac.uk | website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html | office: +44(0)207 882 3058 | mobile: +44(0)772 992 0401 | twitter: @RJABuggs — Reply to this email directly or view it on GitHub.

danmaclean commented 11 years ago

Hi Bernardo, Im going to put your stats on the wiki page for this entry. Just so you know you can use Markdown to format comments nicely for GitHub

RichardBuggs commented 11 years ago

I have added CEGMA analysis results to the wiki. These are a bit better than the 454 assembly at ashgenome.org.

Richard

On 16 May 2013, at 13:33, Dan MacLean wrote:

Hi BErnardo, Im going to put your stats on the wiki page for this entry. Just so you know you can use Markdown to format comments nicely for GitHub

— Reply to this email directly or view it on GitHub.


Dr Richard Buggs | Senior Lecturer | School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, United Kingdom | email: r.buggs@qmul.ac.uk | website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html | office: +44(0)207 882 3058 | mobile: +44(0)772 992 0401 | twitter: @RJABuggs