bioinfo-pf-curie / TMB

Tumor Mutational Burden
Other
49 stars 15 forks source link

pyEffGenomeSize throws error - single positional indexer is out-of-bounds #18

Open jpcartailler opened 1 year ago

jpcartailler commented 1 year ago

Grehttps://vumc365.sharepoint.com/sites/pancreatlas-teametings,

Command used:

pyEffGenomeSize.py \
  --gtf gencode.v20.annotation.sorted.gtf \
  --filterNonCoding \
  --bed Twist_ILMN_Exome_2.0_Plus_Panel.hg38.sorted.bed

Input files (prior to sorting)

Both original gencode and bed files were sorted (as required by pyEffGenomeSize.py):

bedtools sort -i gencode.v20.annotation.gtf > gencode.v20.annotation.sorted.gtf
bedtools sort -i Twist_ILMN_Exome_2.0_Plus_Panel.hg38.bed > Twist_ILMN_Exome_2.0_Plus_Panel.hg38.sorted.bed

The script runs for a minute or so and throws the following error:

chr1    HAVANA  exon    69091   70008   .       +       .       gene_id "ENSG00000186092.4";transcript_id "ENST00000335137.3";gene_type "protein_coding";gene_status "KNOWN";gene_name "OR4F5";transcript_type "protein_coding";transcript_status "KNOWN";transcript_name "OR4F5-001";exon_number 1;exon_id "ENSE00002319515.1";level 2;protein_id "ENSP00000334393.3";tag "CCDS";ccdsid "CCDS30547.1";havana_gene "OTTHUMG00000001094.1";havana_transcript "OTTHUMT00000003223.1";
 chr1   HAVANA  exon    139790  139847  .       -       .       gene_id "ENSG00000239906.1";transcript_id "ENST00000493797.1";gene_type "antisense";gene_status "NOVEL";gene_name "RP11-34P13.14";transcript_type "antisense";transcript_status "KNOWN";transcript_name "RP11-34P13.14-001";exon_number 2;exon_id "ENSE00001922992.1";level 2;tag "basic";havana_gene "OTTHUMG00000002481.1";havana_transcript "OTTHUMT00000007038.1";
 chr1   HAVANA  exon    140075  140339  .       -       .       gene_id "ENSG00000239906.1";transcript_id "ENST00000493797.1";gene_type "antisense";gene_status "NOVEL";gene_name "RP11-34P13.14";transcript_type "antisense";transcript_status "KNOWN";transcript_name "RP11-34P13.14-001";exon_number 1;exon_id "ENSE00001913281.1";level 2;tag "basic";havana_gene "OTTHUMG00000002481.1";havana_transcript "OTTHUMT00000007038.1";
 chr1   HAVANA  exon    141474  143011  .       -       .       gene_id "ENSG00000241860.3";transcript_id "ENST00000484859.1";gene_type "processed_transcript";gene_status "NOVEL";gene_name "RP11-34P13.13";transcript_type "antisense";transcript_status "KNOWN";transcript_name "RP11-34P13.13-004";exon_number 2;exon_id "ENSE00001911218.1";level 2;tag "basic";havana_gene "OTTHUMG00000002480.3";havana_transcript "OTTHUMT00000007035.1";
 chr1   HAVANA  exon    142808  143011  .       -       .       gene_id "ENSG00000241860.3";transcript_id "ENST00000490997.2";gene_type "processed_transcript";gene_status "NOVEL";gene_name "RP11-34P13.13";transcript_type "antisense";transcript_status "KNOWN";transcript_name "RP11-34P13.13-003";exon_number 3;exon_id "ENSE00001838397.1";level 2;tag "basic";havana_gene "OTTHUMG00000002480.3";havana_transcript "OTTHUMT00000007036.1";
 chr1   HAVANA  exon    146386  149707  .       -       .       gene_id "ENSG00000241860.3";transcript_id "ENST00000484859.1";gene_type "processed_transcript";gene_status "NOVEL";gene_name "RP11-34P13.13";transcript_type "antisense";transcript_status "KNOWN";transcript_name "RP11-34P13.13-004";exon_number 1;exon_id "ENSE00001860404.1";level 2;tag "basic";havana_gene "OTTHUMG00000002480.3";havana_transcript "OTTHUMT00000007035.1";
 chr1   HAVANA  exon    146386  146509  .       -       .       gene_id "ENSG00000241860.3";transcript_id "ENST00000490997.2";gene_type "processed_transcript";gene_status "NOVEL";gene_name "RP11-34P13.13";transcript_type "antisense";transcript_status "KNOWN";transcript_name "RP11-34P13.13-003";exon_number 2;exon_id "ENSE00001853409.1";level 2;tag "basic";havana_gene "OTTHUMG00000002480.3";havana_transcript "OTTHUMT00000007036.1";
 chr1   HAVANA  exon    146642  146831  .       -       .       gene_id "ENSG00000241860.3";transcript_id "ENST00000490997.2";gene_type "processed_transcript";gene_status "NOVEL";gene_name "RP11-34P13.13";transcript_type "antisense";transcript_status "KNOWN";transcript_name "RP11-34P13.13-003";exon_number 1;exon_id "ENSE00001868647.1";level 2;tag "basic";havana_gene "OTTHUMG00000002480.3";havana_transcript "OTTHUMT00000007036.1";
 chr1   HAVANA  exon    450740  451678  .       -       .       gene_id "ENSG00000278566.1";transcript_id "ENST00000426406.2";gene_type "protein_coding";gene_status "KNOWN";gene_name "OR4F29";transcript_type "protein_coding";transcript_status "KNOWN";transcript_name "OR4F29-001";exon_number 1;exon_id "ENSE00002316283.2";level 2;protein_id "ENSP00000409316.1";tag "CCDS";ccdsid "CCDS41220.1";havana_gene "OTTHUMG00000002860.1";havana_transcript "OTTHUMT00000007999.1";
 chr1   HAVANA  exon    685716  686654  .       -       .       gene_id "ENSG00000273547.1";transcript_id "ENST00000332831.3";gene_type "protein_coding";gene_status "KNOWN";gene_name "OR4F16";transcript_type "protein_coding";transcript_status "KNOWN";transcript_name "OR4F16-001";exon_number 1;exon_id "ENSE00002324228.2";level 2;protein_id "ENSP00000329982.2";tag "CCDS";ccdsid "CCDS41221.1";havana_gene "OTTHUMG00000002581.1";havana_transcript "OTTHUMT00000007334.1";
 None
chrom   start   end     num     list    /data/cds_group/reference/Twist_ILMN_Exome_2.0_Plus_Panel.hg38.sorted.bed       filtered_gtf.gtf
 chr1   69090   70008   1       2       0       1
 chr1   139789  139847  1       2       0       1
 chr1   140074  140339  1       2       0       1
 chr1   141473  143011  1       2       0       1
 chr1   146385  149707  1       2       0       1
 chr1   450739  451678  1       2       0       1
 chr1   685715  686654  1       2       0       1
 chr1   760910  761154  1       2       0       1
 chr1   761777  761989  1       2       0       1
 None
Traceback (most recent call last):
  File "/data/p_magnuson_lab/conda/envs/pytmb/bin/pyEffGenomeSize.py", line 182, in <module>
    df.columns = df.iloc[0]
  File "/data/p_magnuson_lab/conda/envs/pytmb/lib/python3.10/site-packages/pandas/core/indexing.py", line 1073, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/data/p_magnuson_lab/conda/envs/pytmb/lib/python3.10/site-packages/pandas/core/indexing.py", line 1625, in _getitem_axis
    self._validate_integer(key, axis)
  File "/data/p_magnuson_lab/conda/envs/pytmb/lib/python3.10/site-packages/pandas/core/indexing.py", line 1557, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
(pytmb) [cartaij@horus src]$ echo "${CMD}"
pyEffGenomeSize.py     --gtf /data/cds_group/Processed-Data/2023-275-Jan_T-WGS-Cutaneous-Squamous-Cell-Carcinoma/data/gencode.v20.annotation.sorted.gtf     --filterNonCoding      --bed /data/cds_group/reference/Twist_ILMN_Exome_2.0_Plus_Panel.hg38.sorted.bed

Suggestions?