Closed cgirardot closed 3 years ago
This is actually not a problem with the version of HiCExplorer, but as it seems from the Galaxy wrapper. I noticed that since version 1.8 the wrapper mutually excludes the bin size option and the restriction cut site file, which should not be done. Anyway, the really old version 1.7 seems to have not the mutual exclusion, maybe it is worth giving it a try with this version? However, the QC report might be missing, I am not sure, this is a HiCExplorer and Galaxy wrapper version before I started to work on this project and I would need to check the source code first for any statement.
However, your WF: It could be that a) the second sorting is causing an issue and b) that the output bam file is not having the datatype 'qname_sorted', if this is the case Galaxy sorts by default the file and causes this crash.
But yeah, I need to update the wrappers to the latest HiCExplorer version, but don't have that much time for it at the moment. Maybe I will try to fix it in between other stuff here and then.
This is interesting, as a quick fix I can also amend the wrapper locally and pass the restriction cut site file (after a command line test to make sure this works as expected).
I really don't see why the second sorting would cause an issue but I'll check the files manually as the error suggests files are not correctly sorted. As for (b), changing the type does not solve it (manual launch of the job).
What is supposed to be in the valid reads BAM ? Only mapped read pairs I assume ?
(b), changing the type does not solve it (manual launch of the job).
The problem is then that the output is defined as BAM and then Galaxy sorts it by default. To change the format after the output is placed, will not change this.
sorry I am being unclear. So far I am testing this and run the steps manually (no WF) so the result of the qname sorting using the Picard tool is qname sorted (it is actually SAM, pls see picture below). Then I manually execute hicBuildMatrix which fails complaining about unsorted reads; which I can only conceive if my BAMs (that I extract from the valid reads BAM) would contain different set of reads (which is why I set the extra filters above to make sure I only extract read pairs with both end mapped).
regarding your comment
This is actually not a problem with the version of HiCExplorer, but as it seems from the Galaxy wrapper. I noticed that since version 1.8 the wrapper mutually excludes the bin size option and the restriction cut site file, which should not be done
I read from hicBuildMatrix 3.4.3 help that it is either binSize or restrictionCutFile:
--restrictionCutFile BED file, -rs BED file
BED file with all restriction cut places (output of
"findRestSite" command). Should contain only mappable
restriction sites. If given, the bins are set to match
the restriction fragments (i.e. the region between one
restriction site and the next). Alternatively, a fixed
binSize can be defined instead. However, either
binSize or restrictionCutFile must be defined.
(default: None)
and indeed the command line does not accept it :
hicBuildMatrix: error: argument --binSize/-bs: not allowed with argument --restrictionCutFile/-rs
Alright, so you really need HiCExplorer version 3.6. I will try to find some time on the weekend to update the wrappers, but I don't know how long it might take until the Galaxy project accepts them + how long it will take it is installed on hicexplorer.usegalaxy.eu.
Best,
Joachim
that would be awesome, thank you. Note that we have our own local server so as soon as this is in the toolshed, we can deploy.
@joachimwolff I still wanted to understand why my approach is not working and there is something not adding up here.
I have checked the order of my 2 sam files with
paste <(awk '{print $1}' fwd_qnamesorted.sam) <(awk '{print $1}' rev_qnamesorted.sam) | awk '$1 != $2{print NR}'
and no difference is reported!
So I executed hicBuildMatrix command line, which errored with:
AssertionError: FATAL ERROR NB501598:1014:HH5TLBGXF:1:11101:10002:14630 NB501598:1014:HH5TLBGXF:1:11101:10002:2140 Be...```
and located the row number for NB501598:1014:HH5TLBGXF:1:11101:10002:14630
, this is actually right at the beginning as shown below (fifth read after last header):
(skipping header)
@PG ID:bwa PN:bwa
NB501598:1014:HH5TLBGXF:1:11101:10000:1294 81 chr3L
NB501598:1014:HH5TLBGXF:1:11101:10001:13024 65 chr2R
NB501598:1014:HH5TLBGXF:1:11101:10001:16048 65 chr2L
NB501598:1014:HH5TLBGXF:1:11101:10001:17608 65 chr2R
NB501598:1014:HH5TLBGXF:1:11101:10002:14630 81 chr4
NB501598:1014:HH5TLBGXF:1:11101:10002:2140 65 chr3L
and from mate
(skipping header)
@PG ID:bwa PN:bwa
NB501598:1014:HH5TLBGXF:1:11101:10000:1294 145 chr2R
NB501598:1014:HH5TLBGXF:1:11101:10001:13024 145 chr2R
NB501598:1014:HH5TLBGXF:1:11101:10001:16048 145 chr2L
NB501598:1014:HH5TLBGXF:1:11101:10001:17608 145 chr2R
NB501598:1014:HH5TLBGXF:1:11101:10002:14630 129 chr2R
NB501598:1014:HH5TLBGXF:1:11101:10002:2140 129 chrX
the reads are perfectly aligned to me! And I don't get why hicBuildMatric complains :-/
The updated wrappers as promised: https://github.com/galaxyproject/tools-iuc/pull/3481
Sorry, I did not have the time to go into detail about your workaround issue. I hope I can find some time next week for it.
Best,
Joachim
Thank you!
I am really sorry the PR on tools-iuc is still not merged. I don't know why the people care less about merging passing PRs, while other PRs seem to be accepted quite fast. Maybe you know someone with merge rights?
I know Bjorn but you pung him twice already.
I will have time over the weekend. But @cgirardot please feel free to have a look at the PR from @joachimwolff and give feedback.
I don't have much experience with writing wrappers for galaxy (or very simple ones). I unfortunately don't think I can really judge by reading the PR.
Dear, I am using galaxy and therefore the version 3.4.3 of hicexplorer. My problem would be solved with the hicexplorer 3.6 but all my workflows are in Galaxy so I am stuck. Here is the problem, maybe you can help.
I am mainly using binned matrices but I always build the restriction enzyme matrix as well. In particular the QC is more accurate as hicBuildMatrix running in this mode detects and filters more reads ie
With rest site file:
Bining (no rest site):
I am therefore interested to build a bin matrix with hicBuildMatrix but only using the reads that are deemed "good" from "hicBuildMatrix with restr. site file". For this, I thought I could use the "valid read" BAM files created by hicBuildMatrix.
The WF would look like this:
Unfortunately, the last step keep failing with
I am very confused as the BAM files I recreate from the "valid read" BAM file look OK. I made sure to only keep reads that are
Do you have an idea what could happen here? Have you ever tried a WF like this? Thx