BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
211 stars 71 forks source link

Error in correct : no inconsistent.bed files #26

Closed berylc closed 2 years ago

berylc commented 5 years ago

Having issues figuring out the error message from flair correct. The command and error message are:

$python flair.py correct -f gencode.v30lift37.fixed.annotation.gtf -c hg19.chrom.sizes.fixed2.txt -q pancreas.bed
                                                                                                                                                                                                                                                 Traceback (most recent call last):: 100%|##########################################################################################################################################################################| 31/31 [01:14<00:00,  2.03s/it]
  File "/tools/flair/bin/ssCorrect.py", line 327, in <module>
    main()
  File "tools/flair/bin/ssCorrect.py", line 309, in main
    with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/analysis/flair/tmp_bda22e0a-6614-4624-8703-4212e76e4505/7_inconsistent.bed'
Correction command did not exit with success status

It seems to be failing at the splice site correction step 5/5

The gencode GTF, the bed file and the chromosome size file all have chr7 represented.

Also not sure if helps, but there is a _known_juncs.bed and _temp_juncs.bed in the temp folder for every chromosome, but there are no *_inconsistent.bed files.

Any idea how I can get this to run? Thank you!

csoulette commented 5 years ago

Hi berylc,

According to the error, the script is looking for a bed file corrected/inconsistent reads but cannot find it. The prefix for these corrected/inconsistent files come from the chromosome names come from the annotated junction files you supply (mainly the GTF). You mentioned that "chr7" is represented in your GTF, but the prefix for the file that is throwing the error is "7". Is this supposed to be "chr7"? Since you have the folder with the temporary files still, can you check that the prefix matches for each _known_juncs.bed and _temp_reads.bed. Also, if you look in any of these corresponding files do the chromosome names match? For example, there should be a chr7_known_juncs.bed and chr7_temp_reads.bed, and within each file, the first column should have consistent chromosome name formatting "chr7".

Thanks~

-CMS

berylc commented 5 years ago

Thanks for the quick response! I think you're asking if there's any 'chr7' vs '7' discrepancies in my input and output files? That is not the case.

Here are the contents of the temp folder

10_known_juncs.bed  15_known_juncs.bed  1_known_juncs.bed   3_known_juncs.bed  8_known_juncs.bed           GL000202.1_known_juncs.bed  GL000237.1_known_juncs.bed
10_temp_reads.bed   15_temp_reads.bed   1_temp_reads.bed    3_temp_reads.bed   8_temp_reads.bed            GL000204.1_known_juncs.bed  GL000241.1_known_juncs.bed
11_known_juncs.bed  16_known_juncs.bed  20_known_juncs.bed  4_known_juncs.bed  9_known_juncs.bed           GL000205.1_known_juncs.bed  GL000241.1_temp_reads.bed
11_temp_reads.bed   16_temp_reads.bed   20_temp_reads.bed   4_temp_reads.bed   9_temp_reads.bed            GL000205.1_temp_reads.bed   M_known_juncs.bed
12_known_juncs.bed  17_known_juncs.bed  21_known_juncs.bed  5_known_juncs.bed  GL000192.1_known_juncs.bed  GL000212.1_known_juncs.bed  X_known_juncs.bed
12_temp_reads.bed   17_temp_reads.bed   21_temp_reads.bed   5_temp_reads.bed   GL000192.1_temp_reads.bed   GL000212.1_temp_reads.bed   X_temp_reads.bed
13_known_juncs.bed  18_known_juncs.bed  22_known_juncs.bed  6_known_juncs.bed  GL000193.1_known_juncs.bed  GL000220.1_known_juncs.bed  Y_known_juncs.bed
13_temp_reads.bed   18_temp_reads.bed   22_temp_reads.bed   6_temp_reads.bed   GL000195.1_known_juncs.bed  GL000220.1_temp_reads.bed   Y_temp_reads.bed
14_known_juncs.bed  19_known_juncs.bed  2_known_juncs.bed   7_known_juncs.bed  GL000195.1_temp_reads.bed   GL000228.1_known_juncs.bed
14_temp_reads.bed   19_temp_reads.bed   2_temp_reads.bed    7_temp_reads.bed   GL000199.1_known_juncs.bed  GL000228.1_temp_reads.bed

I've confirmed there's no chr prefix in the files.

Here are the first lines of the sample bed (from bam2Bed12 in flair, gtf, and chromosome sizes files are follows): bed:

1   14399   18957   d4215a58-0cff-452f-8f86-bd2dd77a3c04;0  1   -   14399   18957   217,95,2    7   29,180,145,137,147,99,45,   0,2476,2824,3206,3515,3868,4513,

gtf

chr sizes

1   249250621

Also FWIW while flair correct fails at this step every time, the specific inconsistent_juncs.bed file it doesn't find are different with each run (probably because non of the inconsistent files are being generated).

I can share samples files I have if that helps?

one other point is that the program core dumps, and I'm not sure if this is due to the error or memory - I am using 20G, but let me know if that's not enough?

csoulette commented 5 years ago

Hi berylc,

Thanks for confirming that the prefix are consistent across your files.

... the specific inconsistent_juncs.bed file it doesn't find are different with each run (probably because non of the inconsistent files are being generated).

Yes, this is expected behavior since the *_inconsistent files that the script checks/reads through are not stored in a sorted manner.

one other point is that the program core dumps, and I'm not sure if this is due to the error or memory - I am using 20G, but let me know if that's not enough?

We haven't ran into memory issues. The only parameter that might cause a memory issue would be using a very large number of threads, but it seems like you aren't using that parameter? We've ran correct on a small laptop with 8gb memory and 4 threads just fine.

This memory error is more likely due to failure of the ssCorrect helper script ssPrep, which would make sense since ssPrep is the script that produces the _corrected and _inconsistent bed files. Also, i've been able to recapitulate your error if I force ssPrep to fail during step 5 of correction. It would be helpful to look at the entire stack trace of error calls when running correct. Are there any other error messages that you see before FLAIR reports that the Correction command did not exit with success status ?

Thanks~

-CMS

berylc commented 5 years ago

This is the only output message

python /humgen/atgu1/fs03/berylc/tools/flair/flair.py  correct -f gencode.v30lift37.fixed.annotation.gtf -c hg19.chrom.sizes.fixed2.txt -q pancreas.bed
                                                                                                                                                 Traceback (most recent call last):: 100%|##########################################################################| 31/31 [00:37<00:00,  1.01s/it]
  File "/humgen/atgu1/fs03/berylc/tools/flair/bin/ssCorrect.py", line 327, in <module>
    main()
  File "/humgen/atgu1/fs03/berylc/tools/flair/bin/ssCorrect.py", line 309, in main
    with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp_f9a2e2bc-4c26-4811-831c-d6227a12346f/5_inconsistent.bed'
Correction command did not exit with success status`

Just to make sure, I also tried on a 50G node. I tried not specifying the thread parameter (which is what I'd been doing) and also -t 2 and -t 10. And I see the same error. Any thoughts?

berylc commented 5 years ago

If you'd like to try to recreate the error on your side, I've placed all the files in https://personal.broadinstitute.org/berylc/flair/

I will have to remove them once you download them though. Thanks!

csoulette commented 5 years ago

Hi berylc,

Thanks for passing along the information / data. I've downloaded it and will do tests to see what is going on.

-CMS

csoulette commented 5 years ago

Hi berylc,

I took a look at your data and found some peculiarities with you annotation file. The error I received had to do with 0-length introns that were called from annotation file parsing. I found 819 instances in which an exon annotation had been split into more than 1 GTF entry. This is something I haven't observed in the gencode annotation files i've been working with. Here an example of a transcript with 3 exon entries, but only 1 intron.

hgt_genome_4d44_ea33a0

Nevertheless, I've included a fix in the newest version of ssCorrect and ssPrep to merge such split exons. Let me know if there are any further issues!

-CMS

berylc commented 5 years ago

Hi Carmen, thanks for looking into this! Are you sure the issue was fixed? I am still getting the same error. The flair repository I have is up to date.

$python /humgen/atgu1/fs03/berylc/tools/flair/flair.py correct -f gencode.v30lift37.fixed.annotation.gtf -c hg19.chrom.sizes.fixed2.txt -q sample.bed -g Homo_sapiens_assembly19.fasta
                                                                                                                                                                                      Traceback (most recent call last):: 100%|###############################################################################################################| 31/31 [00:53<00:00,  1.71s/it]
  File "/humgen/atgu1/fs03/berylc/tools/flair/bin/ssCorrect.py", line 345, in <module>
    main()
  File "/humgen/atgu1/fs03/berylc/tools/flair/bin/ssCorrect.py", line 327, in main
    with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'analysis/flair/correct/tmp_399cbad7-2e5e-4b65-8e81-dd741e27cbce/1_inconsistent.bed'
Correction command did not exit with success status
berylc commented 5 years ago

Also wanted to update that I tried with a different GTFs (gencode and ensembl) and it is throwing the same error. These are the publicly available hg19 GTFs that are commonly used, so don't think they should have major errors. Here is how I get the GTFs:

Gencode hg19 GTF

$wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

$gunzip gencode.v19.annotation.gtf.gz

$cat gencode.v19.annotation.gtf | sed 's/^chr//' > gencode.v19.annotation.fixed.gtf

Running flair

python flair.py correct -f gencode.v19.annotation.fixed.gtf -c hg19.chrom.sizes..txt -q sample.bed -g Homo_sapiens_assembly19.fasta

raceback (most recent call last):: 100%|#################################################################################################################| 24/24 [00:49<00:00,  2.05s/it]
  File "/flair/bin/ssCorrect.py", line 345, in <module>
    main()
  File "lair/bin/ssCorrect.py", line 327, in main
    with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp_d4074840-25e9-492e-97eb-3b01003dab09/1_inconsistent.bed'
Correction command did not exit with success status

Also I accidentally ran correct on gencode.v19.annotation.gtf and not the .fixed.gtf version, and it ran but gave an empty file (just mentioning because it may be a useful check to give an error)

Ran on ENSEMBL hg19 GTF

wget ftp://ftp.ensembl.org/pub/grch37/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh37.85.gtf.gz

python flair.py correct -f Homo_sapiens.GRCh37.85.gtf -c hg19.chrom.sizes..txt -q sample.bed -g Homo_sapiens_assembly19.fasta
                                                                                                                                                                                        Traceback (most recent call last):: 100%|#################################################################################################################| 45/45 [19:08<00:00, 25.53s/it]
  File "flair/bin/ssCorrect.py", line 345, in <module>
    main()
  File "flair/bin/ssCorrect.py", line 327, in main
    with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp_f8fd5b1f-b4b9-4484-b3e4-bcfa3f9d3603/1_inconsistent.bed'
Correction command did not exit with success status

Interestingly this time, here are the contents of the tmp folder

10_known_juncs.bed  18_temp_reads.bed   5_known_juncs.bed           GL000195.1_known_juncs.bed   GL000212.1_known_juncs.bed   GL000221.1_temp_reads.bed   GL000241.1_corrected.bed
10_temp_reads.bed   19_known_juncs.bed  5_temp_reads.bed            GL000195.1_temp_reads.bed    GL000212.1_temp_reads.bed    GL000222.1_known_juncs.bed  GL000241.1_inconsistent.bed
11_known_juncs.bed  19_temp_reads.bed   6_known_juncs.bed           GL000196.1_corrected.bed     GL000213.1_known_juncs.bed   GL000222.1_temp_reads.bed   GL000241.1_known_juncs.bed
11_temp_reads.bed   1_known_juncs.bed   6_temp_reads.bed            GL000196.1_inconsistent.bed  GL000215.1_known_juncs.bed   GL000223.1_known_juncs.bed  GL000241.1_temp_reads.bed
12_known_juncs.bed  1_temp_reads.bed    7_known_juncs.bed           GL000196.1_known_juncs.bed   GL000216.1_corrected.bed     GL000223.1_temp_reads.bed   GL000242.1_known_juncs.bed
12_temp_reads.bed   20_known_juncs.bed  7_temp_reads.bed            GL000196.1_temp_reads.bed    GL000216.1_inconsistent.bed  GL000224.1_known_juncs.bed  GL000242.1_temp_reads.bed
13_known_juncs.bed  20_temp_reads.bed   8_known_juncs.bed           GL000199.1_known_juncs.bed   GL000216.1_known_juncs.bed   GL000224.1_temp_reads.bed   GL000243.1_known_juncs.bed
13_temp_reads.bed   21_known_juncs.bed  8_temp_reads.bed            GL000201.1_known_juncs.bed   GL000216.1_temp_reads.bed    GL000225.1_known_juncs.bed  GL000247.1_known_juncs.bed
14_known_juncs.bed  21_temp_reads.bed   9_known_juncs.bed           GL000201.1_temp_reads.bed    GL000218.1_known_juncs.bed   GL000228.1_known_juncs.bed  MT_known_juncs.bed
14_temp_reads.bed   22_known_juncs.bed  9_temp_reads.bed            GL000204.1_known_juncs.bed   GL000218.1_temp_reads.bed    GL000228.1_temp_reads.bed   MT_temp_reads.bed
15_known_juncs.bed  22_temp_reads.bed   GL000191.1_known_juncs.bed  GL000205.1_known_juncs.bed   GL000219.1_known_juncs.bed   GL000229.1_known_juncs.bed  X_known_juncs.bed
15_temp_reads.bed   2_known_juncs.bed   GL000191.1_temp_reads.bed   GL000205.1_temp_reads.bed    GL000219.1_temp_reads.bed    GL000230.1_known_juncs.bed  X_temp_reads.bed
16_known_juncs.bed  2_temp_reads.bed    GL000192.1_known_juncs.bed  GL000209.1_known_juncs.bed   GL000220.1_known_juncs.bed   GL000231.1_known_juncs.bed  Y_known_juncs.bed
16_temp_reads.bed   3_known_juncs.bed   GL000192.1_temp_reads.bed   GL000211.1_corrected.bed     GL000220.1_temp_reads.bed    GL000233.1_known_juncs.bed  Y_temp_reads.bed
17_known_juncs.bed  3_temp_reads.bed    GL000193.1_known_juncs.bed  GL000211.1_inconsistent.bed  GL000221.1_corrected.bed     GL000236.1_known_juncs.bed
17_temp_reads.bed   4_known_juncs.bed   GL000194.1_known_juncs.bed  GL000211.1_known_juncs.bed   GL000221.1_inconsistent.bed  GL000237.1_known_juncs.bed
18_known_juncs.bed  4_temp_reads.bed    GL000194.1_temp_reads.bed   GL000211.1_temp_reads.bed    GL000221.1_known_juncs.bed   GL000240.1_known_juncs.bed

Any ideas? I don't think I have any other human gene annotation GTFs to try...

csoulette commented 5 years ago

Hi berylc,

Thanks for your patience!

I added an extra option to correct that should help by adding a little more granularity into some of the steps/possible errors when running ssCorrect. You can find it in the latest update to flair. Please rerun flair correct with the --print_check option. This option will produce a file in your current working directory with the same suffix as the temporary directory created while running correct.

I find it strange that there are no system calls for any of the helper scripts failing, which I suspect is happening since the _corrected / _inconsistent files are not being generated. The additional --print_check option should help in figuring out if the helper scripts are failing.

Thanks~

-CMS

berylc commented 5 years ago

No worries, hopefully we can figure out a solution.

I ran with print check but am not seeing any updated error message? ensembl gtf

$python /humgen/atgu1/fs03/berylc/tools/flair/flair.py correct -f Homo_sapiens.GRCh37.85.gtf -c hg19.chrom.sizes.txt -q sample.bed -g Homo_sapiens_assembly19.fasta --print_check

Step 5/5: Correcting Splice Sites:   0%|                                                                                                                    | 0/45 [00:00<?, ?it/s]

Step 5/5: Correcting Splice Sites:   2%|##3                                                                                                      | 1/45 [06:32<4:47:36, 392.20s/it]

Step 5/5: Correcting Splice Sites:  16%|################3                                                                                        | 7/45 [10:13<1:42:23, 161.68s/it]

Step 5/5: Correcting Splice Sites:  29%|##############################9                                                                            | 13/45 [14:54<34:26, 64.59s/it]

Step 5/5: Correcting Splice Sites:  44%|###############################################5                                                           | 20/45 [17:16<10:42, 25.70s/it]
                                                                                                                                                                                  Traceback (most recent call last):: 100%|###########################################################################################################| 45/45 [19:00<00:00, 25.34s/it]
  File "/humgen/atgu1/fs03/berylc/tools/flair/bin/ssCorrect.py", line 369, in <module>
    main()
  File "/humgen/atgu1/fs03/berylc/tools/flair/bin/ssCorrect.py", line 351, in main
    with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/broad/macarthur/berylc/longread/gtex_5/analysis/flair/correct/tmp_80821498-610c-4d47-9056-c3a2659355fc/1_inconsistent.bed'
Correction command did not exit with success status

There's an error file which I've put here: https://personal.broadinstitute.org/berylc/flair/err_tmp_80821498-610c-4d47-9056-c3a2659355fc.txt

The last line is related to GL000192, which is in the bed, gtf and chrom sizes file.

gencode v19 gtf

$python flair.py correct -f gencode.v19.annotation.fixed.gtf -c hg19.chrom.sizes.txt -q sample.bed -g Homo_sapiens_assembly19.fasta --print_check

                                                                                                                                                                                  Traceback (most recent call last):: 100%|###########################################################################################################| 24/24 [00:59<00:00,  2.46s/it]
  File "flair/bin/ssCorrect.py", line 369, in <module>
    main()
  File "flair/bin/ssCorrect.py", line 351, in main
    with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp_85e4181d-2d25-472a-a9a0-81c68bcd5249/1_inconsistent.bed'
Correction command did not exit with success status

Error file in https://personal.broadinstitute.org/berylc/flair/err_tmp_85e4181d-2d25-472a-a9a0-81c68bcd5249.txt

csoulette commented 5 years ago

Hi berylc,

According to the lack of errors in the new Error files you've uploaded, it seems like ssPrep is starting, but exiting without error. So weird. I've tried adding a few more print statement to check that the splice site database is being constructed without error. I've also added a check to ensure that each _corrected.bed and _inconsistent.bed file are being created. Again, these files should be created even if your known junction database is empty, or if your _temp_reads.bed files are empty.

Something that may help elucidate the problem would be to call the helper script itself, rather than calling it through the flair.py wrapper. To do this, you can run ssPrep on any set of temporary files in your tmp directory..i.e.:

python ~/bin/flair/bin/ssPrep.py -i ./tmp_436833de-538a-47ea-810f-0850b12eb159/11_temp_reads.bed -j ./tmp_436833de-538a-47ea-810f-0850b12eb159/11_known_juncs.bed -o chr11 -f hg19.fixed.fa --workingDir ./ --check_file chr11_err_check.txt

Running this command should create 3 files in your working directory:

chr11_corrected.bed
chr11_inconsistent.bed
chr11_err_check.txt

Thanks~

-CMS

berylc commented 5 years ago

Hey Carmen, I tried this with the gencode 19 and Ensembl gtfs just to be sure.

python /tools/flair/bin/ssPrep.py -i ./tmp_ec17dd8f-3517-444c-bb26-fbc38fb555b5/11_temp_reads.bed -j ./tmp_ec17dd8f-3517-444c-bb26-fbc38fb555b5/11_known_juncs.bed -o chr11 --workingDir ./ --check_file chr11_err_check.txt -f Homo_sapiens_assembly19.fasta

I keep getting a Segmentation fault (core dumped) error. I've gone up to 100G of memory. I also went back and ran correct with 8 threads just to check, still get the same error.

Here's the output of err_check

** Correcting ./tmp_beb390ea-7262-4e0d-8366-f9ba135424ae/11_temp_reads.bed with a wiggle of 15 against ./tmp_beb390ea-7262-4e0d-8366-f9ba135424ae/11_known_juncs.bed. Checking splice sites with genome Homo_sapiens_assembly19.fasta.
** Initializing int tree for chromosome chr11
** Checking SS motifs for chromosome chr11
** Checked 30384 splice sites for chromosome chr11... Adding to int tree
** Correcting ./tmp_b0a4b964-6c7f-4bef-9a88-0117358a1e84/11_temp_reads.bed with a wiggle of 15 against ./tmp_b0a4b964-6c7f-4bef-9a88-0117358a1e84/11_known_juncs.bed. Checking splice sites with genome Homo_sapiens_assembly19.fasta.
** Initializing int tree for chromosome chr11
** Checking SS motifs for chromosome chr11
** Checked 47983 splice sites for chromosome chr11... Adding to int tree

This is a human long read file with about 8 million reads. The gtfs are ~ 1.4G, the bed is 1.1 G and the fasta is 3.6G. Do you think it's the memory error?

csoulette commented 5 years ago

Hi berylc,

Which scripts are causing these core dumps? ssPrep.py, or flair.py correct?

The segmentation fault is suspicious, but none of the input files stored entirely in memory. Do you know if you have a memory usage restriction? If so, what is it? I believe you can check with ulimit -a.

Thanks for the error output. I'm still narrowing down where the script is failing. It's not yet clear why or what is causing the script to fail. The memory intensive part of the script is finishing (according to the print_check). I've added a few more print statements to figure out where specifically the script is exiting/failing. You can pull the latest version and rerun ssPrep.py.

berylc commented 5 years ago

The Segmentation Fault errors were coming out of ssPrep.py

ulimit -a output:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514765
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65536
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I re ran flair correct with print check to get the temp files error file here https://personal.broadinstitute.org/berylc/flair/err_tmp_ae416eaa-8109-4b9f-b5da-b8c0ab713ffe.txt

Then ran ssPrep

python tools/flair/bin/ssPrep.py -i ./tmp_ff25fa9c-9f6a-456f-8dbb-48812b971688/11_temp_reads.bed -j tmp_ff25fa9c-9f6a-456f-8dbb-48812b971688/11_known_juncs.bed -o chr11 --workingDir ./ -f Homo_sapiens_assembly19.fasta

Are you sure you didn't comment out print statements for ssPrep? This doesn't give an output or any other error message other than Segmentation fault (core dumped)

Let me know if this helps. I might move on from this tool at this point, but you have all the files I've been running on - has it been working on your side? Happy to provide any additional files.

csoulette commented 5 years ago

Hi berylc,

This behavior is odd, as ssPrep.py has a very small memory footprint and would only use at most couple hundred MB of memory. Perhaps there is some sort of environment-related issue that you're running into when running our workflow. Moving forward, the best suggestion I have at this point would be to just use our docker image to run the workflow. You can find information on how to run the dockerized version of flair in our GitHub readme here.

jasteen commented 5 years ago

not sure whether you got any further with this, but i'm having exactly the same problem. ssPrep doesnt appear to be generating the temporary files in the tempdir when called from flair.py correct. whilst %s_known_juncs.bed and %s_temp_reads.bed both exist, neither %s_inconsistent.bed or %s_inconsistent.bed do. this has only started happening to me since I did a fresh pull this morning, i'm trying to work out which previous commit I was working with that wasnt broken.

edit: the first commit to fail is the one where the genome.fa was added to the flair correct logic. 71c775791d1a8122310a64a1608558dc0e051a04

csoulette commented 5 years ago

Hi jasteen,

The addition of the genome.fa logic was introduced to match splice site motifs with chromosome strandedness. Aligners will sometimes call a splice site motif as both an acceptor and donor site, and so we added some logic to resolve these instances.

I've changed the code to do this splice site checking, which may resolve your issue ( commit 9ef7890). Please be sure that in addition to pybedtools, you also have a copy of bedtools installed and the associated executables in your $PATH variable (pybedtools will look for fastaFromBed).

Alternatively, you can keep using the previous version of correct. The number of miscalled splice sites really depends on how well you curate/filter your supplemental junctions from short-read data.

Thanks~

-CMS

zwardnz commented 5 years ago

I had the same issue:

File "/home/zoe/bin/flair-master/bin/ssCorrect.py", line 410, in main() File "flair-master/bin/ssCorrect.py", line 392, in main with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd: IOError: [Errno 2] No such file or directory: 'tmp_7cbb84fe-2943-483a-a8ff-5ad42fb94d05/GL000219.1_inconsistent.bed' Correction command did not exit with success status

I then tried to run the ssPrep.py script separately as advised which required me to install kerneltree pip install kerneltree After this flair.py correct seems to run okay

leafiezyt commented 5 years ago

Thanks so much. Once I installed kerneltree (pip install kerneltree), directly running flair.py correct is fine now.

mmiladi commented 5 years ago

Same here, just now! :) Unfortunately kerneltree is not supported by conda, which made the dependency missing in my conda env at the first try.

berylc commented 5 years ago

hi @mmiladi and @zwardnz I just came back to this issue and trying to run FLAIR on my samples and getting the same error. @mmiladi when you say kerneltree is not supported by conda, do you mean you created an environment outside of conda (ie. kerneltree doesn't work at all in conda?).

I have a conda environment set up and see this on my institute cluster

(flair_fimm)  $pip install kerneltree --user
Requirement already satisfied: kerneltree in /home/unix/berylc/.local/lib/python3.6/site-packages (0.0.5)
Requirement already satisfied: cython in /home/unix/berylc/.local/lib/python3.6/site-packages (from kerneltree) (0.29.10)

But getting the same error on correct.

Just wanted to check, thanks!

berylc commented 5 years ago

oops didn't mean to close the issue, reopening

mmiladi commented 5 years ago

Hi @berylc,

It works. You only need to take care that pip is installed on your conda env and then use --user. Please check here for details: https://www.anaconda.com/using-pip-in-a-conda-environment/

Also Alison has prepared a conda env file (https://github.com/BrooksLabUCSC/flair/blob/master/misc/flair_conda_env.yaml). This should make it easy to directly create an env from it by calling conda env create -f environment.yml.

Best, Milad

jellingford commented 4 years ago

Hi All,

I encountered identical problems to those discussed above, I've tried all of the above steps and finally found a combination which worked:

python flair.py correct AND ssCorrect.py appear to run and output correctly.

Jamie

sgiannouk commented 4 years ago

Hi everyone,

I also stepped into the same problem, but I was able to overcome it by only installing

but without having to install the conda environment. I am just running flair from its source directory.

Stavros

dywang0323 commented 4 years ago

I also run into the same issue, I installed the kerneltree, here is my command to run flair correction python /work/schroederrna/Software/flair-1.4/flair.py correct -c /work/schroederrna/chromosome_size.tsv -q /work/schroederrna/output_Susan_2/hbecpolya/hbecpolya_GRch38.bed12 -f /work/schroederrna/Data_nanopore/gencode.v27.chr_patch_hapl_scaff.annotation.gtf -g /work/schroederrna/Data_nanopore/Reference/gencode_v27_transcripts.fa there's a error report : File "/work/schroederrna/Software/flair-1.4/bin/ssCorrect.py", line 339, in main() File "/work/schroederrna/Software/flair-1.4/bin/ssCorrect.py", line 321, in main with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd: FileNotFoundError: [Errno 2] No such file or directory: '/scratch/dywang/tmp_b15df542-634d-48a3-b870-97ac448129e9/chr1_inconsistent.bed' Correction command did not exit with success status

can you help me to figure out. thanks

Jeltje commented 2 years ago

Hi everyone. The latest release (v1.6.1) fixes this bug. In essence what was happening is that if no splices were found the program would not create an empty inconsistent.bed file for that chromosome, which led to the FileNotFound error downstream.

Please download the latest release, or use pip install flair-brookslab

If this does not solve the problem please reopen this ticket.