Closed skoren closed 6 years ago
Hi Sergey,
Perhaps you can share the stdout? I can compare to the one we ran before pushing and try to find a divergence point.
Best, Olga
Sure, output attached. run.out.gz
Sergey,
Something appears to go differently for you around step 8. Can you share the asm.7.cprops and asm.7.asm?
Best, Olga
step7.tar.gz
Here is everything with the name *.7.*
except the hic file since that was >800mb
Hi Sergey,
It seems the megascaffold output in step 7 is inverted in your run as compared to what we have in history, which leads to downstream differences. I will need some time to investigate what could lead to this and if I can replicate. Do you think you could tell me the sort, awk and parallel versions? Thank you,
Olga
$ sort --version
sort (GNU coreutils) 8.27
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
$ parallel --version
GNU parallel 20150722
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015 Ole Tange
and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --bibtex'.
$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
It seems the awk is older than the version listed in the README, perhaps that is the issue. I can try updating awk and seeing what happens.
I re-ran the assembly with awk 4.0.2 and I now get the same assembly as in the paper so it seems something between awk 3.1.7 and 4.0.2 is causing the change in assembly.
Sergey, Thank you for letting us know. With best wishes, Olga
I've downloaded and run the software and can successfully reproduce the NA12878 assembly. However, I can't reproduce the AaegL4 assembly from the paper using the publicly available data. I ran with the command:
The md5s for the inputs are:
As far as I can tell the pipeline runs without error but I end up with one unspilt scaffold of 1.1Gbp and all the rest <2mb rather than the expected 3. I am guessing something is failing in the rabl splitting code but I didn't see any obvious errors in the output. Let me know if there are intermediate files or program output I can share to help diagnose this issue.