Open EarlyEvol opened 5 years ago
Hi Earl,
Sorry for delay, was traveling. Can you please clarify if you are running in haploid or diploid mode?
Thanks, Olga
Olga,
No problem on the delay, I am also traveling right now.
I ran 3d-dna in default mode, so should be haploid mode. Here is the code I used.
SAMPLE=GanF1
wd=~/WorkingDir/assemblies/$SAMPLE\_juicer
nodup=$wd/aligned/merged_nodups.txt
ref=$wd/references/$SAMPLE\_genome.srt.fasta
threads=26
##### stuff to prepare directory #####
mkdir $wd
mkdir $wd/3d.2
cd $wd/3d.2
bash ~/apps/3d-dna/run-asm-pipeline.sh $ref $nodup
Also, it looks like running the whole pipeline with shorter contigs from abyss works, although only about 50% of the assembly gets into the final chromosome level scaffolds. That scaffolding might be useful to confirm the PacBio Canu scaffolding later on.
Side question: Is there any tuning required for very high AT content genomes? My wasps are about 29% GC.
Thanks, Earl
Earl,
This is such unusual behavior (in that I had not have it reported before that I think I won’t be able to say anything unless I try to reproduce. If you can share privately the mnd and fasta (or cprops: for this stage I don’t actually need fasta) so that I can see what’s going on for myself, that would perhaps be a fastest way forward. I understand you are experiencing problems with two independent assemblies, correct? Re your question about AT. There are general recommendations on tuning error-correction for libraries with varying coverage which is what one might expect. I’d look into those. Best, Olga
Yes, two assemblies are having trouble related to the .hic encoded genome length. I'm happy to share the files with you. What is you preferred method?
Thanks so much, Earl
Earl, If you could host and send credentials for access to olga.dudchenko@bcm.edu, that perhaps would be best. Thanks! Olga
On Jul 15, 2019, at 9:46 AM, EarlyEvol notifications@github.com wrote:
Yes, two assemblies are having trouble related to the .hic encoded genome length. I'm happy to share the files with you. What is you preferred method?
Thanks so much, Earl
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/3d-dna/issues/50?email_source=notifications&email_token=ACLAMG43242QMFPNTMGGV4LP7SEVVA5CNFSM4H3VBZTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ55VPY#issuecomment-511433407, or mute the thread https://github.com/notifications/unsubscribe-auth/ACLAMGZVK4NQ7BHKXZFALWLP7SEVVANCNFSM4H3VBZTA.
Olga,
Now I have to apologize for the delay. I just got back from some conference travels. I have sent a link to your email with the merged_nodups and the cprops files. There are also some other output files from 3d-dna. The rawchrom.hic is the first to show the odd behavior.
Thanks for all your help.
Earl
Earl,
One thing I can see already that you don't have 3d-dna running at all: if you open .0.hic file it looks just like just visualizing the draft (and you can confirm that by looking at .0.assembly). Something is preventing you from running the scaffolder on your specific system, and I can't tell you what since I can't reproduce. I'm attaching what the .0.assembly should look like when properly run. What happens on this system later seems less meaningful to discuss.
https://www.dropbox.com/s/i4f2brsngzvaj9g/Earl.0.assembly?dl=0
Best, Olga
Haha, well dang. That is a major problem. I added a log file (in the linked folder) from 3d-dna and I don't see anything that makes it look like scaffolding didn't run.
I'm running 3d-dna again on a different machine now. Hopefully Ill have different results tomorrow. Earl
Indeed the log is fine: took the identical number of iterations as I just ran, so it was doing someting. But the output file is different. Just in case, make sure you are running in a clean folder or something: in case this is some overwrite permission issue or whoknowswhat.
Hi,
I'm getting some pretty weird behavior from 3d-dna when trying to correct misjoins. In a nutshell, the hic files from polish on show the diagonal as 2x the assembly length, with all the extra connection having no data.
For another assembly, the .hic files show the correct input assembly length, but most of the connections are blank.
There are no obvious errors in the logging.
I have gotten this behavior out of a node on our cluster and a local workstation, so I don't think is is machine specific. I think all the prerequisites are the correct version.
Interestingly, it looks like running the pipeline with -r 0 outputs really nice scaffolds, with most apparent errors clearly due to misjoins from Canu. While I can manually curate a lot of misjoins, my genomes are pretty repetitive and on some c-somes, the hic shows lots of checker patterns with 1mb units, so I would like to be able to go through a couple of rounds of misjoin correction to see what I get.
Any insight is of course greatly appreciated. Thanks for your time. Earl