flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

task 3: speclist download #17

Closed kuzman1306 closed 5 years ago

kuzman1306 commented 5 years ago

Hi Florent,

I am using ubuntu-18.10 within virtual box (host system: windows). Upon the initinitiation of the task 3, the following error occurred:

Pantagrel pipeline task 2: complete.
[2019-08-09 02:19:50] Pantagrel pipeline task 3: initiate SQL database and load genomic object relationships.
Create new task folder '/home/kuzman/testPTGdatabase/03.database'
Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.
ERROR: could not open HSTS store at '/home/kuzman/.wget-hsts'. HSTS will be disabled.
--2019-08-09 02:19:51--  http://www.uniprot.org/docs/speclist
Resolving www.uniprot.org (www.uniprot.org)... 128.175.245.185, 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|128.175.245.185|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.uniprot.org:443/docs/speclist [following]
--2019-08-09 02:19:51--  https://www.uniprot.org/docs/speclist
Connecting to www.uniprot.org (www.uniprot.org)|128.175.245.185|:443... connected.
HTTP request sent, awaiting response... 200 
Length: unspecified [text/html]
Saving to: ‘speclist’

speclist                [            <=>     ]   2,17M   377KB/s    in 5,8s    

2019-08-09 02:19:58 (386 KB/s) - ‘speclist’ saved [2276262]

assemblies (assembly_id, assembly_name, organism, species, subspecies, serovar, strain, taxid, primary_pubmed_id, country, isolation_source, host, clinical_source, collection_year, collection_month, collection_day, sequencing_technology, sequencing_coverage, note)
replicons (assembly_id, genomic_accession, replicon_name, replicon_type, replicon_size)
protein_products (product, nr_protein_id)
protein_fams (nr_protein_id, protein_family_id)
codingsequences (genomic_accession, locus_tag, cds_begin, cds_end, cds_strand, genbank_cds_id, nr_protein_id)
cdsfam (genbank_cds_id, gene_family_id)
FOQJ01 : BRAD659
FOQJ01.1 None Bradyrhizobium sp
BRASP FOQJ01.1
GCA_900105125.1 None Bradyrhizobium canariense
BRACAN GCA_900105125.1
GCF_000282615.1 None Bradyrhizobium sp.
BRADYR GCF_000282615.1
GCF_000284275.1 None Bradyrhizobium sp.
BRADYR2 GCF_000284275.1
GCF_000284375.1 None Bradyrhizobium japonicum
BRAJAP GCF_000284375.1
GCF_000296215.2 None Bradyrhizobium sp.
BRADYR3 GCF_000296215.2
GCF_000379585.1 None Bradyrhizobium sp.
BRADYR4 GCF_000379585.1
GCF_000426105.1 None Bradyrhizobium sp.
BRADYR5 GCF_000426105.1
GCF_000465325.1 None Bradyrhizobium sp.
BRADYR6 GCF_000465325.1
GCF_000472385.1 None Bradyrhizobium sp.
BRADYR7 GCF_000472385.1
GCF_000472425.1 None Bradyrhizobium sp.
BRADYR8 GCF_000472425.1
/home/kuzman/testPTGdatabase/03.database
Pantagrel pipeline task 3: complete.
[2019-08-09 02:20:10] Pantagrel pipeline task 4: use InterProScan to functionally annotate proteins in the database.
Create new task folder '/home/kuzman/testPTGdatabase/04.functional'
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.6.tsv
inserted values into tables: protein_infos,  8917; functional_annotations,  61562; interpro_terms,  5298; interpro2GO,  5999; interpro2pathway,  6903.
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.0.tsv
inserted values into tables: protein_infos,  8792; functional_annotations,  60945; interpro_terms,   671; interpro2GO,   575; interpro2pathway,   674.
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.5.tsv
inserted values into tables: protein_infos,  8807; functional_annotations,  59260; interpro_terms,   267; interpro2GO,   276; interpro2pathway,   196.
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.3.tsv
inserted values into tables: protein_infos,  8717; functional_annotations,  56859; interpro_terms,   158; interpro2GO,    81; interpro2pathway,   121.
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.2.tsv
inserted values into tables: protein_infos,  8920; functional_annotations,  61851; interpro_terms,   110; interpro2GO,   106; interpro2pathway,    77.
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.7.tsv
inserted values into tables: protein_infos,  8496; functional_annotations,  57193; interpro_terms,   171; interpro2GO,    94; interpro2pathway,   133.
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.8.tsv
inserted values into tables: protein_infos,    47; functional_annotations,   219; interpro_terms,     2; interpro2GO,     0; interpro2pathway,     0.
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.4.tsv
inserted values into tables: protein_infos,  8943; functional_annotations,  63467; interpro_terms,    62; interpro2GO,    37; interpro2pathway,    83.
parse functional annotations of proteome from file /home/kuzman/testPTGdatabase/04.functional/InterProScan_5.36-75.0/all_complete_proteomes/all_proteomes.nr.faa.1.tsv
inserted values into tables: protein_infos,  8943; functional_annotations,  61671; interpro_terms,   105; interpro2GO,    93; interpro2pathway,    56.
/home/kuzman/testPTGdatabase/03.database/testptgdatabase database total changes: 575957
created indexes as follows: CREATE INDEX IF NOT EXISTS funcannot_nrproteinid_idx ON functional_annotations (nr_protein_id);
                     CREATE INDEX IF NOT EXISTS funcannot_analmeth_idx ON functional_annotations (analysis_method);
                     CREATE INDEX IF NOT EXISTS funcannot_sigacc_idx ON functional_annotations (signature_accession);
                     CREATE INDEX IF NOT EXISTS funcannot_method_signacc_idx ON functional_annotations (analysis_method, signature_accession);
                     CREATE UNIQUE INDEX IF NOT EXISTS funcannot_nrproteinid_method_signacc_location_ipversion_uniq ON functional_annotations
                      (nr_protein_id, analysis_method, signature_accession, start_location, stop_location, interproscan_version);
                     CREATE INDEX IF NOT EXISTS funcannot_score_idx ON functional_annotations (score_or_evalue);
                     CREATE INDEX IF NOT EXISTS funcannot_interproid_idx ON functional_annotations (interpro_id);
                     CREATE UNIQUE INDEX IF NOT EXISTS ipterms_interproid_uniq ON interpro_terms (interpro_id);
                     CREATE UNIQUE INDEX IF NOT EXISTS ip2go_interproid_goid_uniq ON interpro2GO (interpro_id, go_id);
                     CREATE INDEX IF NOT EXISTS ip2go_interproid_idx ON interpro2GO (interpro_id);
                     CREATE INDEX IF NOT EXISTS ip2go_goid_idx ON interpro2GO (go_id);
                     CREATE UNIQUE INDEX IF NOT EXISTS ip2pw_interproid_pwdb_pwid_uniq ON interpro2pathways (interpro_id, pathway_db, pathway_id);
                     CREATE INDEX IF NOT EXISTS ip2pw_pwdb_pwid_idx ON interpro2pathways (pathway_db, pathway_id);
                     CREATE INDEX IF NOT EXISTS ip2pw_interproid_idx ON interpro2pathways (interpro_id);
                     CREATE INDEX IF NOT EXISTS ip2pw_pwdb_idx ON interpro2pathways (pathway_db);
                     CREATE INDEX IF NOT EXISTS ip2pw_pwid_idx ON interpro2pathways (pathway_id);
Pantagrel pipeline task 4: complete.

... (full log attached)

However, it seems the pipeline successfully downloaded "speclist from www.uniprot.org". Currently, task 6 is ongoing.

Cheers,

Nemanja

log_file.txt

flass commented 5 years ago

Thank you for reporting this issue. As we discussed before, this is a bug in wget download utility when used in a Windows environment, see https://lists.gnu.org/archive/html/bug-wget/2016-06/msg00072.html.

ERROR: could not open HSTS store at '/home/kuzman/.wget-hsts'. HSTS will be disabled. It should be more a warning than an error as it actually fixes itself by switching from HTTPS to HTTP protocol, as indicated in the message (disable HSTS).

Great to hear that the test is going on fine! Hopefully you can use it for real dataset soon!