BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
205 stars 71 forks source link

predictProductivity 'inconsistent naming convention' error #213

Closed MustafaElshani closed 2 years ago

MustafaElshani commented 2 years ago

I am not entirely sure if this is related to issue 116. All flair scripts are up to date. I ran flair collapse as follows

flair.py collapse
-g ReferenceGenomes/gencode/v41/GRCh38.primary_assembly.genome.fa  
-f  ReferenceGenomes/gencode/v41/gencode.v41.primary_assembly.annotation.gtf 
-r  analysis/flair/flair_concatenated/OCI_all_reads.fastq 
-q analysis/flair/flair_concatenated/OCI_all_corrected.bed 
-o analysis/flair/flair_collapse/OCI_collapsed -t 40

This was then followed with

predictProductivity.py 
-i analysis/flair/flair_collapse/OCI_collapsed.isoforms.bed 
-g ReferenceGenomes/gencode/v41/gencode.v41.primary_assembly.annotation.gtf 
-f /ReferenceGenomes/gencode/v41/GRCh38.primary_assembly.genome.fa 
--longestORF > analysis/flair/OCI_productivity.bed

However this then throws out this error

***** WARNING: File /tmp/pybedtools.oco3v7np.tmp has inconsistent naming convention for record:
GL000009.2  58305   58308   ENSG00000278704.1   .   -

***** WARNING: File /tmp/pybedtools.oco3v7np.tmp has inconsistent naming convention for record:
GL000009.2  58305   58308   ENSG00000278704.1   .   -

Traceback (most recent call last):
  File "/root/miniconda3/envs/flair/bin/predictProductivity.py", line 371, in <module>
    main()
  File "/root/miniconda3/envs/flair/bin/predictProductivity.py", line 337, in main
    isoformObjs = predict(bed, starts, isoformObjs)
  File "/root/miniconda3/envs/flair/bin/predictProductivity.py", line 252, in predict
    isoDict[read].strand = intersection[5]
KeyError: '3793659a-3efd-4ff3-89cc-6569feb89ba5;0'

Is this an issue with the Gencode or is there something wrong with GTF file, I tried not to provide the GTF file in the collapse however the same error appears

Any help would very much be appreciated

Kind Regards

Mustafa

Jeltje commented 2 years ago

Hi Mustafa,

The warning is unrelated to the error, I think. That gene is slightly misannotated in gencode v41 (its start codon doesn't have the same starting coordinate as the CDS) and I get that same warning when I run on my test files, but it's not killing predictProductivity.

Can you do a grep -C5 3793659a-3efd-4ff3-89cc-6569feb89ba5 analysis/flair/flair_collapse/OCI_collapsed.isoforms.bed and paste the results here?

MustafaElshani commented 2 years ago

Hi Jeltje

This is the results from running the above

chr1    11366   16017   3793659a-3efd-4ff3-89cc-6569feb89ba5;0  19  +   11366   16017   1   1   4651,   0,
chr1    14403   16932   c1d81171-16b5-4df8-ab18-4c3df95b5998;16 60  -   14403   16932   1   1   2529,   0,
chr1    19466   24894   35736d34-dc47-4066-8b48-7c4c03e2ca57;0  1   +   19466   24894   1   1   5428,   0,
chr1    23376   26269   5a623bb2-9f14-4a67-838a-54c20e5a14b3;16 1   -   23376   26269   1   1   2893,   0,
chr1    25782   29347   986844b1-475c-4c40-bc54-9db02b6c0aa6;16 1   -   25782   29347   2   1   3565,   0,
chr1    28588   29347   2f8ef01b-140a-4095-a7fb-544d68c634ea;16 1   -   28588   29347   25  1   759,    0,

Mustafa

Jeltje commented 2 years ago

OK, so it's failing on the first line. I copied this (putting the tabs back in), and it runs just fine for me.

The error seems to imply that you have two extra fields (the code is looking for a + or - but gets the read name). What is the output of cut -f6 analysis/flair/flair_collapse/OCI_collapsed.isoforms.bed | head ? If that's not the strand field (+ or -), check for extra tabs between the first four fields.

What operating system are you running on?

MustafaElshani commented 2 years ago

Hi Jeltje

The above generated this output

+
-
+
-
-
-
+
-
+
-

Do I need to delete the any particular tab? I am using Ubuntu 20.04

Jeltje commented 2 years ago

Could you pip install flair-brookslab in a virtualenv and run that instead? I'm wondering if there's something wrong with your pybedtools.

python3 -m venv flairvenv
source flairvenv/bin/activate
pip install --upgrade pip
pip install flair-brookslab

predictProductivity -i (...)

Please note that predictProductivity no longer needs the .py extension after pip install.

MustafaElshani commented 2 years ago

Hi Jeltje

I tried the above virtualenv installation and run predictProductivity, same original error appeared with '...inconsistent naming record...'

(the code is looking for a + or - but gets the read name)

Is there something that can be adjusted on the predictProductivity code ?

Jeltje commented 2 years ago

Ok, how about this: Get the first ten lines of that file: head analysis/flair/flair_collapse/OCI_collapsed.isoforms.bed > test.bed
Verify that the error happens with this test file: predictProductivity -i test.bed -g (...)
And if it does, please attach that test file to a comment so I can run it locally.

MustafaElshani commented 2 years ago

I have created the test.bed and run it as suggested, the original error appears.

I have attached the test.bed files for you to have a look

again appreciate your help testbed.zip

Jeltje commented 2 years ago

I get no errors using your file. It might be your bedtools version. I'm running with v2.25.
What happens when you run bedtools --version?

MustafaElshani commented 2 years ago

I have been using bedtools v2.29.1 suspecting a problem with it I updated to latest bedtools v2.30 the same error appeared, however after downgrading to bedtools v2.25 there was no error apart from one WARNING

***** WARNING: File /tmp/pybedtools.bv67mfih.tmp has inconsistent naming convention for record:
GL000009.2  58305   58308   ENSG00000278704.1   .   -

***** WARNING: File /tmp/pybedtools.bv67mfih.tmp has inconsistent naming convention for record:
GL000009.2  58305   58308   ENSG00000278704.1   .   -

Now I have my productivityPredicts Thank you for all your help, this allows me to continue with flair. I wonder what changed in bedtools that is causing this problem?

Jeltje commented 2 years ago

Thank you for working through it with me! I will add a bedtools version test to the code and maybe try to figure out what changed between the versions.