GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
463 stars 199 forks source link

JBrowse-1.12.3 prepare-refseqs.pl not making refSeqs.json #930

Closed mathog closed 6 years ago

mathog commented 7 years ago

Installed 1.12.3 yesterday and the "volvox" directory works OK. Tried to install my data from a Maker2 run but that did not work.

cd /var/www/html/mathog
#account used has full access to this directory
mkdir Lv1
bin/flatfile-to-json.pl --gff /home/mathog/maker_work/Lv/maker_pass4.all.gff \
   --out Lv1 --trackType MakerFeatures --trackLabel mygff
nice bin/prepare-refseqs.pl \
  --fasta /home/mathog/maker_work/Lv/dedup.genome.scf.fasta \
  --out Lv1 >/tmp/prepare-refseqs.log 2>&1 &
# log file is empty, no refSeqs.json is created
touch Lv1/tracks.conf
#copied a trackList.json from another machine, which does work to Lv1/trackList.json
#edited trackList.json to match configuration here
rm -rf Lv1/seq
nice bin/prepare-refseqs.pl \
  --fasta /home/mathog/maker_work/Lv/dedup.genome.scf.fasta \
  --out Lv1 >/tmp/prepare-refseqs.log 2>&1 &
# log file is empty, no refSeqs.json is created

The Lv1/seq directory seems to be populated normally, it just doesn't make the refSeqs.json file. There isn't even an empty created there.

This also doesn't work, probably a related problem:

#emits: No reference sequences defined in configuration, nothing to do.

Hard to believe it is an issue with the input fasta headers, as those are straight out of MaSuRCA (from CA (AKA WGS)) and are like this:

head -1 dedup.genome.scf.fasta | od -c
0000000   >   s   c   f   7   1   8   0   0   0   0   5   5   8   9   9
0000020   4  \n

If only the first 10 sequences are run it does create a refSeqs.json file. Maybe there is something toxic in one of the 32916 input sequence scaffolds? 27842 of them made it into the seq directory, the highest numbered one, and maybe last processed, was scf7180000590992, but scf7180000590993 doesn't look at all peculiar to me (length 164168), and it is absent. There were longer ones that were processed, like scf7180000559690 at 269252bp. There are no empty sequences,, the shortest is 1004 bp.

This issue was also posted to the Gmod-ajax] mailing list on sourceforge.

Thanks.

mathog commented 7 years ago

A test case with a single entry works with maker2jbrowse:

#make test case
cd ~/maker_work/Lv
grep scf7180000563054 maker_pass4.all.gff > scf7180000563054.gff
fastarange 3521 3521 <dedup.genome.scf.fasta >scf7180000563054.fasta
echo "##FASTA" >>scf7180000563054.gff
cat scf7180000563054.fasta >>scf7180000563054.gff
#install it
cd /var/www/html/mathog/JB*
bin/maker2jbrowse --out Lv2 /home/mathog/maker_work/Lv/scf7180000563054.gff

It creates seq (with refSeqs.json), names, and tracks. Moreover the entries in tracks are visible in the URL

http://machinename/mathog/JBrowse-1.12.3/?data=Lv2

if they are enabled. So that may be the way to go, unless it too blows up on the full set (running now). Wish I knew though why the other method failed.

mathog commented 7 years ago

That didn't work, when it ran on the full data set

bin/maker2jbrowse --out Lv3 /home/mathog/maker_work/Lv/maker_pass4.all.gff \
  >/tmp/maker2jbrowse.log 2>&1 &

it blew up while filling seq, leaving no messages in the log file.

Is it possible to turn on debugging messages?

enuggetry commented 7 years ago

I'm not familiar with the maker2jbrowse codebase. If you are still having problems with the fasta file, can you share it?

mathog commented 7 years ago

On 05-Oct-2017 21:34, Eric Y wrote:

I'm not familiar with the maker2jbrowse codebase. If you are still having problems with the fasta file, can you share it?

Tried to and uncovered the underlying problem. This command

cp bad.fasta /var/www/html/mathog

failed with an "out of space".

It turns out when this system was originally configured only 51Gb was allocated to the "/" file system, and when apache2 was installed later it left /var/www on that small file system. So there wasn't a lot of space to store data for the web server, and the "tracks" directory for one genome used up most of it, leaving just enough to do part of "seq".
The jbrowse perl scripts must not have been checking return status or passing stderr through, at least not everywhere, so its "out of space" message was not seen when that condition was encountered.

Moved /var/www to /home (9.5Tb file system) and tried maker2jbrowse again and this time it seems to be working. It completed "seq" and made the refSeqs.json file, proving that there was actually nothing toxic in that fasta file.

This CentOS system was remarkably understated for an OS with a full root file system, the only thing in dmesg to indicate a problem was:

Process accounting paused

and /var/log/messages just had

auditd[3953]: Audit daemon is low on disk space for logging auditd[3953]: Audit daemon is suspending logging due to low disk space

It ran fine, other than not being able to store anything more there.

Regards,

David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech

enuggetry commented 7 years ago

Better error reporting probably in order, here. :)