Closed lancy-liang closed 1 year ago
grep "ENSSSCG00000004016" Sus_scrofa.Sscrofa11.1.109.chr.gff3 1 ensembl gene 1527441 1550812 . - . ID=gene:ENSSSCG00000004016;biotype=protein_coding;gene_id=ENSSSCG00000004016;logic_name=ensembl;version=5 1 ensembl mRNA 1527441 1537814 . - . ID=transcript:ENSSSCT00000004441;Parent=gene:ENSSSCG00000004016;biotype=protein_coding;tag=Ensembl_canonical;transcript_id=ENSSSCT00000004441;version=5 1 ensembl mRNA 1527441 1550812 . - . ID=transcript:ENSSSCT00000104344;Parent=gene:ENSSSCG00000004016;biotype=protein_coding;transcript_id=ENSSSCT00000104344;version=1 1 ensembl mRNA 1527609 1537814 . - . ID=transcript:ENSSSCT00000095810;Parent=gene:ENSSSCG00000004016;biotype=protein_coding;transcript_id=ENSSSCT00000095810;version=1
This is a gffutils
exception. There is an issue with your database. Your gff file contains feature(s) that are not in the database. Either you are using a gff file and database that do not match, or you stopped database creation before it finished. Note that the second option is easy to do as if you kill the script before database creation is finished, a .db
file will still be made. When LABRAT
looks for a .db
file, it will find that one and use it.
I would advise to delete that .db
file and start fresh. Note that while database creation make take a while, you only have to do it once.
Hi,I have deleted the .db file and tried running it again, but I still encounter the same error: raise FeatureNotFoundError(key) gffutils.exceptions.FeatureNotFoundError: ENSSSCG00000004016.
OK. This is a difficulty with dealing with the wide variety of formatting when it comes to gff attributes. Try this. At line 755, replace
g = db[gene]
with
g = db['gene:' + gene]
If that changes and/or fixes things, I can update LABRAT with some more sophisticated code at that location.
Dear taliaferrojm, I have made the change as you suggested, but I still encounter an error:
/public/home/name/miniconda3/envs/labrat/lib/python3.6/site-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing as tm
Traceback (most recent call last):
File "/public/home/name/miniconda3/envs/labrat/bin/LABRATsc.py", line 816, in
However, the gene ENSSSCT00000095810 is different from the previously reported error.
Many thanks!
OK try this. At line 758, replace
txpf1 = tx
with
txpf1 = 'transcript:' + tx
and at line 760, replace
txpf2 = tx
with
txpf2 = 'transcript:' + tx
Dear taliaferrojm,
I have made the change as you suggested, but I still encounter an error:
Regarding the error message:
I have a single-cell 10X 3' end BAM file (generated by cellranger), and I encountered the following error when using countfrombam.py
(base) [usersname@login1 script]$ cat 8946612.err
2023-09-01 18:40:48,515 - INFO - Populating features
2023-09-01 21:57:24,505 - INFO - Populating features table and first-order relations: 1530811 features
2023-09-01 21:57:24,505 - INFO - Updating relations
2023-09-01 21:57:43,087 - INFO - Creating relations(parent) index
2023-09-01 21:57:44,960 - INFO - Creating relations(child) index
2023-09-01 21:57:47,140 - INFO - Creating features(featuretype) index
2023-09-01 21:57:48,671 - INFO - Creating features (seqid, start, end) index
2023-09-01 21:57:50,372 - INFO - Creating features (seqid, start, end, strand) index
2023-09-01 21:57:52,174 - INFO - Running ANALYSE features
Traceback (most recent call last):
File "/./workspace/scRNA_APA/labrat/script/countfrombam.py", line 356, in
Regarding the output issues:
(1) Only one file is generated: numberofposfactors.txt, and it is saved in the same path as the submitted script. If I use slurm array to submit the script, only one numberofposfactors.txt file is generated, and the others are overwritten. (2) I didn't obtain any output related to calculating PSI. (3) I read the article "Diverse cell-specific patterns of alternative polyadenylation in Drosophila" and it mentioned putting the BAM file into labrat. However, when using countfrombam.py, I didn't get the classification for ALE and TUTR. Which script should I use for a 10X 3' end BAM file? Many thanks!
You are likely going to have to make the same changes to countfrombam.py that you made to LABRATsc.py. The issues is likely the same, i.e. that the annotation you are using formats the "gene:" and "transcript:" attributes slightly differently than the GENCODE annotations LABRAT was created with.
I have made the changes as you previously instructed, but I am still getting an error.
this is the modified countfrombam.py file as per your instructions. (See attachment)(I have sent this script countfrombam.py to your email)
The gff file I used is :https://ftp.ensembl.org/pub/release-104/gff3/sus_scrofa/Sus_scrofa.Sscrofa11.1.104.chr.gff3.gz
I have made the change as you suggested(replace g = db[gene] with g = db['gene:' + gene], and txpf1 = 'transcript:' + tx,txpf2 = 'transcript:' + tx) but I still encounter an error:
I have a single-cell 10X 3' end BAM file (generated by cellranger), and I encountered the following error when using countfrombam.py
(base) @.*** script]$ cat 8946612.err 2023-09-01 18:40:48,515 - INFO - Populating features 2023-09-01 21:57:24,505 - INFO - Populating features table and first-order relations: 1530811 features 2023-09-01 21:57:24,505 - INFO - Updating relations 2023-09-01 21:57:43,087 - INFO - Creating relations(parent) index 2023-09-01 21:57:44,960 - INFO - Creating relations(child) index 2023-09-01 21:57:47,140 - INFO - Creating features(featuretype) index 2023-09-01 21:57:48,671 - INFO - Creating features (seqid, start, end) index 2023-09-01 21:57:50,372 - INFO - Creating features (seqid, start, end, strand) index 2023-09-01 21:57:52,174 - INFO - Running ANALYSE features Traceback (most recent call last): File "/./workspace/scRNA_APA/labrat/script/countfrombam.py", line 356, in positionfactorintervals = getpositionfactorintervals(args.gff, posfactors) File "/./workspace/scRNA_APA/labrat/script/countfrombam.py", line 254, in getpositionfactorintervals tx = db[txid] File "/./miniconda3/envs/labrat/lib/python3.6/site-packages/gffutils/interface.py", line 227, in getitem raise FeatureNotFoundError(key) gffutils.exceptions.FeatureNotFoundError: ENSSSCT00000042748
------------------ 原始邮件 ------------------ 发件人: "TaliaferroLab/LABRAT" @.>; 发送时间: 2023年9月8日(星期五) 晚上10:55 @.>; @.**@.>; 主题: Re: [TaliaferroLab/LABRAT] LABRATsc - [err]gffutils.exceptions.FeatureNotFoundError (Issue #9)
I need more info to help, like the error you are getting now. Did you make the changes described both here and here?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I made it! thank you very much !!!
Glad you were able to figure it out!
Hi,
When I tried to use LABRATsc, I used the following command:
LABRATsc.py --mode cellbycell --gff $gff_path/Sus_scrofa.Sscrofa11.1.109.chr.gff3 --alevindir $alevindir_path --conditions $condition_path/tissue1_tissue2.tsv --readcountfilter 5 --conditionA tissue1 --conditionB tissue2
but err, (labrat) [name@login1 script]$ cat 8468137.err /public/home/name/miniconda3/envs/labrat/lib/python3.6/site-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead. import pandas.util.testing as tm Traceback (most recent call last): File "/public/home/name/miniconda3/envs/labrat/bin/LABRATsc.py", line 816, in
posfactors = getpositionfactors(args.gff, 25)
File "/public/home/name/miniconda3/envs/labrat/bin/LABRATsc.py", line 755, in getpositionfactors
g = db[gene]
File "/public/home/name/miniconda3/envs/labrat/lib/python3.6/site-packages/gffutils/interface.py", line 227, in getitem
raise FeatureNotFoundError(key)
gffutils.exceptions.FeatureNotFoundError: ENSSSCG00000004016
(labrat) [name@login1 script]$ cat 8468137.out Collecting count data... Reading data in /public/home/name/workspace/scRNA_APA/labratsc/output/tissue/alevin_out/tissue2... Done reading 500 cells Read total 545 cells Found total 1249793 reads Reading data in /public/home/name/workspace/scRNA_APA/labratsc/output/tissue/alevin_out/tissue1... Done reading 500 cells Read total 545 cells Found total 1249793 reads Adding condition/cluster info... Assigning transcripts to polyA sites... Indexing gff... Done indexing!