Open beoungl opened 2 years ago
The error reports that a contig was observed with an end point that is less than the genomic chromosome's start point. Most likely, it indicates that you have files from different reference genome builds. Are your bigwig files mapped to hg38? Do they use the same chromosome labels as the fasta file you're using? If that all looks OK, I would use pdb or print statements to figure out exactly what's confusing the program at the point it returns that error.
I encountered the same error as you. When I removed -g umap_macro.bed, the error disappeared. Is it possible that the above file is based on hg19 and your fasta file is hg38?
Also, all future development will occur here: https://github.com/calico/baskerville You can create training datasets using the analogous hound_train.py
I am running basenji_data.py on Micro-C data using bigWig format for basenji_data.py, and I ran into issue here.
stride_train 1 converted to 131072.000000 stride_test 1 converted to 131072.000000 I'm confused by this event ordering: gstart - cend
Here is the command I used
basenji_data.py -d .1 -g unmap_macro.bed -l 131072 --local -o micro_c -p 8 -t .1 -v .1 -w 128 hg38.ml.fa heart_wigs.txt
and components in heart_wigs.txt file
index identifier file clip sum_stat description 0 Cancer_1 mapped_cancer1.PT.bigwig 384 sum Cancer_1 1 Normal_1 mapped_normal1.PT.bigwig 384 sum Normal_1 2 Cancer_2 mapped_cancer2.PT.bigwig 384 sum Cancer_2 3 Cancer_3 mapped_cancer3.PT.bigwig 384 sum Cancer_3
From the looks of it, this part of the code seems to be causing the issue here.
https://github.com/calico/basenji/blob/master/basenji/genome.py
Is it ok for me to directly change the code to modify accept gstart - cend combination, or was there a specific reason for you to do it that way?