Open alexlenail opened 6 years ago
@zfrenchee can you confirm that you are talking about https://github.com/galaxyproject/galaxy/blob/dev/tools/filters/bed_to_bigbed.xml
What was the error you got?
@bgruening Yes, that's the tool
The error message I get after I manually wget'ed the hg19.len file was:
column #10 isSizeLink do not match: Yours=[0] BED Standard=[1]
asObjects differ.
@zfrenchee your bedfile seems to be at fault. The tool is simply using the USCS conversion tool. Can you check your BED file? Also see here: https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/opwtvFfIslQ
@bgruening
The BED files I am trying to convert to BigBEDs are output narrowpeaks from MACS2, which is to say they are
BED6+4
formatted. This seems to cause the tool to crash. Since converting MACS2 narrowpeaks to BigBEDs is part of the canonical ENCODE pipeline for ATAC-Seq this tool should probably be able to handle this use case. jennaj recommended doing a sortBED in advance but that did not solve the issue.
Hi guys, looks like maybe the tool can't take macs2 narrowpeak as it needs to also use an autoSql file? see this post (and below) from the UCSC google group: https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/narrowpeak$20bigbed/genome/wZsmrO9m0bg/gyO_KeRBAwAJ
narrowPeak files can be visualized directly on the UCSC Genome Browser as a custom track, no need to convert to bigBed first. However if you wanted to, example 3 on this page: http://genome.ucsc.edu/goldenPath/help/bigBed.html is a good example of how to convert a non-standard bed file to a bigBed file, in that you need to supply bedToBigBed with an autoSql file that describes your data. For more information see this question from our mailing list archive: https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/9PXjH2mlqrE/MrBs3pZ9WLEJ
Ah the Galaxy BED-to-bigBed converter tool says that non-standard bed (that require autoSql files) are not currently supported:
Currently, the bedFields option to specify the number of non-standard fields is not supported as an AutoSQL file must be provided, which is a format currently not supported by Galaxy.
Many thanks for looking into this @mblue9. The strange part is we use a command line tool which executes this without a problem, I was assuming the tools would be the same.
bedToBigBed a_sorted_bed_file.bed -bedFields=6 /path/to/chrom.sizes out_file_name.bed
(thanks @bwassie)
@bgruening @mblue9 if adding this functionality is impossible or impractical in the scope of the Galaxy BED-to-BigBED tool, please feel free to close this issue.
@zfrenchee are you're using the ENCODE bigToBigBed tool? https://www.encodeproject.org/software/bedToBigBed/
as that says it can handle non-standard bed but also says it needs a .as file:
bedToBigBed takes a standard bed file or a non-standard bed file with associated .as file to create a compressed bigBed version
Just wondering if you had to cut out 6 columns from the macs2 file to get it to work without the .as file? How many columns does your sorted bed file (a_sorted_bed_file.bed
) have?
Hi @mblue9, Yeah we're using the UCSC bedToBigBed tool. I just checked and it runs and produces an output when we use the standard macs2 narrowPeak file with 10 columns. I don't know if the bigBed file will visualize correctly on UCSC but it does run without an error.
Thanks for the info @bwassie. My guess is if you don't use the .as file the output may not view correctly in UCSC. As did you see there was a new format bigNarrowPeak (announced in December by UCSC) and that also requires the bedToBigBed tool to be run with a .as file, see below.
Does anyone know if the Galaxy bedToBigBed wrapper could be changed to accept an autoSql (.as) file?
Below from http://genome.ucsc.edu/goldenPath/help/bigNarrowPeak.html:
bigNarrowPeak Track Format The bigNarrowPeak format stores annotation items that are a single block with a single base peak within that block, much as BED files indexed as bigBeds do. A bigNarrowPeak file is a standard six field bed with four additional fields that contain three doubles with scoring information and the location of the single base peak. It is the binary version of the ENCODE narrowPeak or point-source peak format.
The bigNarrowPeak files are created using the program bedToBigBed, run with the -as option to pull in a special autoSql (.as) file that defines the extra fields of the bigNarrowPeak.
The bigNarrowPeak files are in an indexed binary format. The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server. Because of this, indexed binary files have considerably faster display performance than regular BED format files when working with large data sets. The bigNarrowPeak file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigNarrowPeak files, please see the Hosting section of the Track Hub Help documentation.
bigNarrowPeak file definition The following autoSql definition is used to specify bigNarrowPeak files. This definition, contained in the file bigNarrowPeak.as, is pulled in when the bedToBigBed utility is run with the -as=bigNarrowPeak.as option.
The first time I tried to run this, I got
Couldn't open /galaxy-central/tool-data/shared/ucsc/chrom/hg19.len
. As it turns out, there was no folder/galaxy-central/tool-data/shared/ucsc/chrom/
. I decided to just wget it. However, this should probably be distributed with Galaxy going forwards?The BED files I am trying to convert to BigBEDs are output narrowpeaks from MACS2, which is to say they are
BED6+4
formatted. This seems to cause the tool to crash. Since converting MACS2 narrowpeaks to BigBEDs is part of the canonical ENCODE pipeline for ATAC-Seq this tool should probably be able to handle this use case. @jennaj recommended doing asortBED
in advance but that did not solve the issue.Offshoot of Galaxy 17.05, Docker container from @bgruening