TreesLab / CircMiMi

A package for constructing CLIP-seq data-supported circRNA-miRNA-mRNA interactions
MIT License
5 stars 1 forks source link

hg19 reference genome bug #2

Open MigleSur opened 1 year ago

MigleSur commented 1 year ago

Dear developers,

Thanks for creating CircMiMi, it's an easy to use and very useful tool. However, I am having problems creating an hg19 reference genome instead of hg38. I get the following error: `circmimi_tools genref --species hsa --source ensembl --version 75 refs/ --2022-11-09 12:05:13-- ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz => './Homo_sapiens.GRCh37.75.gtf.gz' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.139 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-75/gtf/homo_sapiens ... done. ==> SIZE Homo_sapiens.GRCh37.75.gtf.gz ... 39344043 ==> PASV ... done. ==> RETR Homo_sapiens.GRCh37.75.gtf.gz ... done. Length: 39344043 (38M) (unauthoritative)

Homo_sapiens.GRCh37.75.gtf.gz 100%[=================================================>] 37.52M 33.3MB/s in 1.1s

2022-11-09 12:05:15 (33.3 MB/s) - './Homo_sapiens.GRCh37.75.gtf.gz' saved [39344043]

--2022-11-09 12:05:15-- ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz => './Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.139 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-75/fasta/homo_sapiens/dna ... done. ==> SIZE Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz ... 869930767 ==> PASV ... done. ==> RETR Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz ... done. Length: 869930767 (830M) (unauthoritative)

Homo_sapiens.GRCh37.75.dna.pri 100%[=================================================>] 829.63M 45.3MB/s in 27s

2022-11-09 12:05:43 (30.4 MB/s) - './Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz' saved [869930767]

--2022-11-09 12:05:43-- ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.75.cdna.all.fa.gz => './Homo_sapiens.GRCh37.75.cdna.all.fa.gz' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.139 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-75/fasta/homo_sapiens/cdna ... done. ==> SIZE Homo_sapiens.GRCh37.75.cdna.all.fa.gz ... 60646070 ==> PASV ... done. ==> RETR Homo_sapiens.GRCh37.75.cdna.all.fa.gz ... done. Length: 60646070 (58M) (unauthoritative)

Homo_sapiens.GRCh37.75.cdna.al 100%[=================================================>] 57.84M 33.5MB/s in 1.7s

2022-11-09 12:05:46 (33.5 MB/s) - './Homo_sapiens.GRCh37.75.cdna.all.fa.gz' saved [60646070]

--2022-11-09 12:05:46-- ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh37.75.ncrna.fa.gz => './Homo_sapiens.GRCh37.75.ncrna.fa.gz' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.139 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-75/fasta/homo_sapiens/ncrna ... done. ==> SIZE Homo_sapiens.GRCh37.75.ncrna.fa.gz ... 7735481 ==> PASV ... done. ==> RETR Homo_sapiens.GRCh37.75.ncrna.fa.gz ... done. Length: 7735481 (7.4M) (unauthoritative)

Homo_sapiens.GRCh37.75.ncrna.f 100%[=================================================>] 7.38M 12.3MB/s in 0.6s

2022-11-09 12:05:47 (12.3 MB/s) - './Homo_sapiens.GRCh37.75.ncrna.fa.gz' saved [7735481]

Traceback (most recent call last): File "/Users/migab/miniconda3/envs/circmimi/bin/circmimi_tools", line 8, in sys.exit(cli()) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/click/core.py", line 760, in invoke return __callback(args, **kwargs) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/circmimi/scripts/circmimi_tools.py", line 164, in generate_references info, ref_files = genref.generate(species, source, version, ref_dir) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/circmimi/reference/genref.py", line 311, in generate anno_ref.generate() File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/circmimi/reference/genref.py", line 26, in generate gendb.generate(self.src_name, self.filename) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/circmimi/reference/gendb.py", line 193, in generate tables_raw_data.parse(gtf_path) File "/Users/migab/miniconda3/envs/circmimi/lib/python3.10/site-packages/circmimi/reference/gendb.py", line 113, in parse biotypes = sorted(set(map(itemgetter(2), genes + transcripts))) TypeError: '<' not supported between instances of 'NoneType' and 'str'`

Could you help me with fixing this issue?

Best wishes, Migle

chiangtw commented 1 year ago

Hi Migle,

Thanks for the feedback on CircMiMi package!

This bug is due to the annotation GTF file from Ensembl 75, which not having the "transcript_biotype" attribute for "transcript". You may try the GTF file from Gencode v19 instead, it should be work nicely.

We would fix this bug in the near future.

Best, tw