ComparativeGenomicsToolkit / hal

Hierarchical Alignment Format
Other
158 stars 40 forks source link

HalPhyloPTrain.py error #203

Open SimonaSecomandi opened 3 years ago

SimonaSecomandi commented 3 years ago

Hi all,

I'm trying to generate a neutral model for my Cactus alignment. I can't understand how the input file (the exons annotation for hal4dExtract) has to be generated. I know that has to be a BED12 format file. I tried to generate it by downloading the GFF annotation file for my species, and then using these commands from the UCSC browser to convert it to BED12:

./gff3ToGenePred GCF_015227805.1.gff GCF_015227805.1.genePred ./genePredToBed GCF_015227805.1.genePred GCF_015227805.1.bed

Then I used the bed file with halPhyloPTrain.py:

halPhyloPTrain.py --numProc 32 --substMod REV --noAncestors 9_genomes.hal Hirundo_rustica GCF_015227805.1.bed 4d_neutral_model.mod

This is the failure:

hal exception caught: Error parsing BED blockStarts: unplaced_BUSCO_68766at7742 43949 55624 rna-XM_040054516.1 0 - 43949 55613 0 19 69,108,39,47,10,114,74 in input bed line 1113,60,28,140,77,124,648, 0,791,1454,1495,2111,2122,2238,2969,3115,3592,3914,4607,5422,5972,6034,6484,7263,8158,11027, Traceback (most recent call last): File "/home/users/simona.secomandi/Hirundo/bin/hal/bin/halPhyloPTrain.py", line 260, in sys.exit(main()) File "/home/users/simona.secomandi/Hirundo/bin/hal/bin/halPhyloPTrain.py", line 257, in main computeModel(args) File "/home/users/simona.secomandi/Hirundo/bin/hal/bin/halPhyloPTrain.py", line 133, in computeModel extractGeneMAFs(options) File "/home/users/simona.secomandi/Hirundo/bin/hal/bin/halPhyloPTrain.py", line 47, in extractGeneMAFs options.hal, options.refGenome, bedFile, bedFile4d)) File "/gpfs/home/users/simona.secomandi/Hirundo/bin/hal/stats/halStats.py", line 28, in runShellCommand (command, sts)) RuntimeError: Command: hal4dExtract ../../9_genomes.hal Hirundo_rustica GCF_015227805.1_bHirRus1.pri.v2_genomic_CORRECT.gff.genePred.bed 4d_neutral_model_halPhyloPTrain_temp_GWTABCE_GCF_015227805.1_bHirRus1.pri.v2_genomic_CORRECT.gff.genePred4d.bed exited with non-zero status 1

Could you please provide an example of the needed BED12 file or how to obtain it? I think that it could be the problem here..

Many thanks,

Simona

diekhans commented 3 years ago

hal exception caught: Error parsing BED blockStarts: unplaced_BUSCO_68766at7742 43949 55624 rna-XM_040054516.1 0 - 43949 55613 0 19 69,108,39,47,10,114,74 in input bed line 1113,60,28,140,77,124,648, 0,791,1454,1495,2111,2122,2238,2969,3115,3592,3914,4607,5422,5972,6034,6484,7263,8158,11027,

This error is cause by the number of number of comma-separated numbers in block starts numbers not matching the specified number of blocks. The trailing comma is handled.

If you create a GFF with one transcript that re-creates this problem, we can figure it out.

SimonaSecomandi commented 3 years ago

Many thanks for your help.

This is the bed12 line that cause the problem:

es_errore_bed12.bed.txt

This is the correponding GFF for that transcript:

es_errore.gff.txt

Could this be helpful? Maybe something went wrong when converting the entire GFF file into bed12 file?

diekhans commented 3 years ago

Hmm, this looks ok to me. It appears like this is a NCBI GFF, can you give me the URL of the original?

SimonaSecomandi @.***> writes:

Many thanks for your help.

This is the bed12 line that cause the problem:

es_errore.gff.txt

This is the correponding GFF for that transcript:

es_errore_bed12.bed.txt

Could this be helpful? Maybe something went wrong when converting the entire GFF file into bed12 file?

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/ComparativeGenomicsToolkit/hal/issues/203#issuecomment-813935362 Many thanks for your help.

This is the bed12 line that cause the problem:

es_errore.gff.txt

This is the correponding GFF for that transcript:

es_errore_bed12.bed.txt

Could this be helpful? Maybe something went wrong when converting the entire GFF file into bed12 file?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.*

SimonaSecomandi commented 3 years ago

Yes it is a GFF file from NCBI for the Barn swallow. Here's the link https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/227/805/GCF_015227805.1_bHirRus1.pri.v2/GCF_015227805.1_bHirRus1.pri.v2_genomic.gff.gz

Many thanks!