Open sinanshi opened 8 years ago
what the old file looks like?
They have a list of files called inparanoid_files.txt
. You get something like this,
sqltable.M.musculus.fa-P.trichocarpa.fa
sqltable.M.mulatta.fa-P.troglodytes.fa
sqltable.A.thaliana.fa-M.musculus.fa
sqltable.A.thaliana.fa-T.nigroviridis.fa
sqltable.D.melanogaster.fa-D.rerio.fa
sqltable.S.purpuratus.fa-T.nigroviridis.fa
files are like this,
1 2740 C.hominis.fa 1.000 Chro.30328 100%
1 2740 Y.lipolytica.fa 1.000 YALI0C10648g 100%
2 1617 C.hominis.fa 1.000 Chro.30301 100%
2 1617 Y.lipolytica.fa 1.000 YALI0C16566g 100%
3 1239 C.hominis.fa 1.000 Chro.80425 100%
3 1239 Y.lipolytica.fa 1.000 YALI0C11407g 100%
4 1116 C.hominis.fa 1.000 Chro.60382 100%
4 1116 Y.lipolytica.fa 1.000 YALI0C22550g 100%
5 1049 C.hominis.fa 1.000 Chro.80341 100%
5 1049 Y.lipolytica.fa 1.000 YALI0A00352g 100%
5 1049 Y.lipolytica.fa 0.588 YALI0A20152g
6 1025 C.hominis.fa 1.000 Chro.10043 100%
6 1025 Y.lipolytica.fa 1.000 YALI0F12155g 100%
7 923 C.hominis.fa 1.000 Chro.80150 100%
7 923 Y.lipolytica.fa 1.000 YALI0A17127g 100%
8 913 C.hominis.fa 1.000 Chro.20010 100%
8 913 Y.lipolytica.fa 1.000 YALI0D08184g 99%
8 913 Y.lipolytica.fa 0.846 YALI0F25289g
8 913 Y.lipolytica.fa 0.817 YALI0E35046g
8 913 Y.lipolytica.fa 0.739 YALI0D22352g
9 891 C.hominis.fa 1.000 Chro.60546 100%
9 891 Y.lipolytica.fa 1.000 YALI0B20724g 100%
10 854 C.hominis.fa 1.000 Chro.20293 100%
10 854 Y.lipolytica.fa 1.000 YALI0B13904g 100%
11 852 C.hominis.fa 1.000 Chro.30427 100%
11 852 Y.lipolytica.fa 1.000 YALI0C07953g 100%
12 846 C.hominis.fa 1.000 Chro.60284 100%
12 846 Y.lipolytica.fa 1.000 YALI0F04169g 100%
13 842 C.hominis.fa 1.000 Chro.30434 100%
13 842 Y.lipolytica.fa 1.000 YALI0A00264g 100%
14 767 C.hominis.fa 1.000 Chro.10119 100%
14 767 Y.lipolytica.fa 1.000 YALI0A17941g 100%
15 755 C.hominis.fa 1.000 Chro.30389 100%
15 755 Y.lipolytica.fa 1.000 YALI0C17347g 99%
16 753 C.hominis.fa 1.000 Chro.80361 100%
16 753 Y.lipolytica.fa 1.000 YALI0F20218g 100%
17 753 C.hominis.fa 1.000 Chro.80294 100%
17 753 Y.lipolytica.fa 1.000 YALI0D12210g 99%
18 735 C.hominis.fa 1.000 Chro.80471 100%
18 735 Y.lipolytica.fa 1.000 YALI0B15642g 99%
19 725 C.hominis.fa 1.000 Chro.30216 100%
Files can be found on the old server /home/sk11/load2/inptables/inparanoid.sbc.su.se/download/7.0_current/
could you trace it here? http://inparanoid.sbc.su.se/download/current/
I found them. http://inparanoid.sbc.su.se/download/current/Orthologs_other_formats/A.aegypti/ You have to download the .tar.gz file first, extract them and you can find the table in it. I guess we have make this by ourself.
I'm still downloading the files. But they are too many. I don't know if it is due to the recent update. There are 4959 files in the old folder, while in for the new data, only counting for entries starting with A, we have already 5409 files. It means that we can expect the number of files will be around 25 time more than the original one. Does that make sence?
mmmm not sure, maybe not all files are needed.....
It has been already one hour since I started downloading, and now it's still at B. We can expect a 10 hours downloading time and around 20-30 Gb storage.
Or..., we can use the old data, which can be found here, http://inparanoid.sbc.su.se/download/old_versions/data_7.0/
This is exactely the same data as the old one.
no, the whole point is to update...
aparently the latest version has more than two times gene sequences than the previous one. 7.0 2009-06 100 1687023 8.0 2013-12 273 3718323
well it is what it is I guess, updates should be different, I guess that it is not one version. Inparanoid go 7.0.1 etc, YOGY has not been updated for the last 6 years,,,,
Size too big for the server, now try to download locally. Hopefully the size of files will be reduced after processing.
I suggest you come here this afternoon, maybe around 4pm and we could look at it together
7.0
1 4101 A.aegypti.fa 1.000 AAEL009959-PA 100%
1 4101 A.thaliana.fa 1.000 At1g80070.1 100%
1 4101 A.thaliana.fa 0.462 At4g38780.1
2 2380 A.aegypti.fa 1.000 AAEL011187-PA 100%
2 2380 A.thaliana.fa 1.000 At1g20960.1 100%
8.0
1 4114 A.aegypti 1.000 Q16UB0 100%
1 4114 A.thaliana 1.000 Q9SSD2 100%
1 4114 A.thaliana 0.490 F4JUG5
2 2380 A.aegypti 1.000 Q16QS5 100%
2 2380 A.thaliana 1.000 Q9SYP1 100%
2 2380 A.thaliana 0.623 O48534
.fa
. Does it matter?Both 7.0 and 8.0 will give error like this:
DBD::mysql::st execute failed: Duplicate entry '1---AAEL007132-PA' for key 'PRIMARY' at perl/yogy_add_inp_terms.pl line 118, <FILE> line 1.
DBD::mysql::st execute failed: Duplicate entry '2---AAEL008855-PA' for key 'PRIMARY' at perl/yogy_add_inp_terms.pl line 118, <FILE> line 3.
DBD::mysql::st execute failed: Duplicate entry '5---AAEL000307-PA' for key 'PRIMARY' at perl/yogy_add_inp_terms.pl line 118, <FILE> line 10.
Well maybe try to keep it as similar as possible to version 7, so you could add the .fa to column 3, column 5 should remain as is. Not sure about the duplicate entries error...
So do you suggest that we just ignor the error?
well not sure, but if the previous version gives the same, we might...
What is the difference between perl/yogy_add_inp_terms.pl
and perl/add_inp_terms-old.pl
and which one should I use for updating inparanoid?
I have no clue, just run the second one then with the new Inparanoid files :-)
will do.
Tried the old script last week, which takes two days to run. I checked cdc10
this morning, and inparanoid table doesn't show up. The newly updated inparanoid_member
table looks quite different from the old one. I don't know if it is just caused by the difference of data. Please let me know if you can spot anything wrong here.
cluster_nr | main_ortholog_score | organism | organism_pair | inparalog_score | uniprot_id |
---|---|---|---|---|---|
1 | 1083 | A.aegypti | A | 1 | Q16FG2 |
1 | 1083 | A.aeolicus | A | 1 | O67512 |
2 | 608 | A.aegypti | A | 1 | Q1HRQ7 |
2 | 608 | A.aeolicus | A | 1 | O66907 |
3 | 591 | A.aegypti | A | 1 | Q0IFX5 |
3 | 591 | A.aeolicus | A | 1 | O67618 |
4 | 590 | A.aegypti | A | 1 | Q17FL3 |
4 | 590 | A.aegypti | A | 1 | Q17H12 |
4 | 590 | A.aeolicus | A | 1 | O67828 |
5 | 581 | A.aegypti | A | 1 | Q17D48 |
5 | 581 | A.aegypti | A | 0.096 | Q16HA3 |
5 | 581 | A.aegypti | A | 0.096 | Q16J19 |
5 | 581 | A.aeolicus | A | 1 | O67411 |
cluster_nr | main_ortholog_score | organism | organism_pair | inparalog_score | uniprot_id |
---|---|---|---|---|---|
1 | 5097 | ensAG | ensAG-ensCE | 1 | AGAP002015-PA |
1 | 5097 | ensCE | ensAG-ensCE | 1 | CE23997 |
2 | 4521 | ensAG | ensAG-ensCE | 1 | AGAP001633-PA |
2 | 4521 | ensCE | ensAG-ensCE | 1 | CE33018 |
3 | 4471 | ensAG | ensAG-ensCE | 1 | AGAP010750-PA |
3 | 4471 | ensCE | ensAG-ensCE | 1 | CE43332 |
4 | 4245 | ensAG | ensAG-ensCE | 1 | AGAP006885-PA |
4 | 4245 | ensCE | ensAG-ensCE | 1 | CE00122 |
5 | 3139 | ensAG | ensAG-ensCE | 1 | AGAP000331-PA |
5 | 3139 | ensCE | ensAG-ensCE | 1 | CE05765 |
6 | 3087 | ensAG | ensAG-ensCE | 1 | AGAP006686-PA |
6 | 3087 | ensCE | ensAG-ensCE | 1 | CE07373 |
7 | 2612 | ensAG | ensAG-ensCE | 1 | AGAP001519-PA |
7 | 2612 | ensCE | ensAG-ensCE | 1 | CE21971 |
I had a feeling that we don't need that many inparanoid data from http://inparanoid.sbc.su.se/download/current. I don't know if we need all the following species.
A.aegypti/ C.elegans/ D.virilis/ K.lactis/ N.gruberi/ P.sorbitophila/ T.adhaerens/ A.aeolicus/ C.familiaris/ D.willistoni/ K.pastoris/ N.haematococca/ P.tetraurelia/ T.annulata/ A.anophagefferens/ C.floridanus/ E.aedis/ L.africana/ N.leucogenys/ P.trichocarpa/ T.asahii/ A.bisporus/ C.gigas/ E.bieneusi/ L.bicolor/ N.parisii/ P.tricornutum/ T.blattae/ A.capsulata/ C.glabrata/ E.caballus/ L.braziliensis/ N.vectensis/ P.tritici-repentis/ T.brucei/ A.carolinensis/ C.globosum/ E.coli/ L.chalumnae/ N.vitripennis/ P.troglodytes/ T.castaneum/ A.cephalotes/ C.gloeosporioides/ E.cuniculi/ L.elongisporus/ O.anatinus/ P.ultimum/ T.chinensis/ A.darlingi/ C.griseus/ E.cymbalariae/ L.infantum/ O.cuniculus/ P.vivax/ T.cruzi/ A.delicata/ C.hominis/ E.dermatitidis/ L.interrogans/ O.dioica/ P.yoelii/ T.delbrueckii/ A.echinatior/ C.immitis/ E.histolytica/ L.loa/ O.garnettii/ R.baltica/ T.gondii/ A.gambiae/ C.intestinalis/ E.nidulans/ L.maculans/ O.latipes/ R.communis/ T.guttata/ A.gossypii/ C.jacchus/ E.siliculosus/ L.major/ O.niloticus/ R.delemar/ T.heterothallica/ A.gypseum/ C.japonica/ F.catus/ L.thermotolerans/ O.sativa/ R.glutinis/ T.hominis/ A.kawachii/ C.lusitaniae/ F.nucleatum/ M.acetivorans/ O.tauri/ R.norvegicus/ T.maritima/ A.melanoleuca/ C.militaris/ F.pseudograminearum/ M.acridum/ P.abelii/ Salpingoeca.sp./ T.melanosporum/ A.mellifera/ C.neoformans/ F.radiculosa/ M.brevicollis/ P.aerophilum/ S.bicolor/ T.nigroviridis/ A.oligospora/ C.owczarzaki/ G.aculeatus/ M.brunnea/ P.aeruginosa/ S.cerevisiae/ T.parva/ A.pisum/ C.parvum/ G.clavigera/ M.domestica/ P.alecto/ S.coelicolor/ T.pseudonana/ A.queenslandica/ C.porcellus/ G.destructans/ M.gallopavo/ P.berghei/ S.commune/ T.rubripes/ A.thaliana/ C.quinquefasciatus/ G.gallus/ M.globosa/ P.brasiliensis/ S.harrisii/ T.rubrum/ B.bassiana/ C.reinhardtii/ G.gorilla/ M.graminicola/ P.carnosa/ S.invicta/ T.spiralis/ B.bovis/ C.remanei/ G.graminis/ M.guilliermondii/ P.chabaudi/ S.italica/ T.stipitatus/ B.dendrobatidis/ C.savignyi/ G.intestinalis/ Micromonas.sp./ P.digitatum/ S.lacrymans/ T.thermophila/ B.distachyon/ C.sinensis/ G.lozoyensis/ M.jannaschii/ P.falciparum/ S.lycopersicum/ T.vaginalis/ B.floridae/ C.trachomatis/ G.max/ M.larici-populina/ P.graminis/ S.macrospora/ T.yellowstonii/ B.fuckeliana/ C.variabilis/ G.sulfurreducens/ M.lucifugus/ P.humanus/ S.mansoni/ U.maydis/ B.hominis/ D.ananassae/ G.violaceus/ M.mulatta/ P.indica/ S.moellendorffii/ U.reesii/ B.japonicum/ D.discoideum/ G.zeae/ M.musculus/ P.infestans/ S.passalidarum/ V.carteri/ B.malayi/ D.grimshawi/ H.arabidopsidis/ M.oryzae/ P.jiroveci/ S.pombe/ V.corneae/ B.mori/ D.hansenii/ H.glaber/ M.osmundae/ P.knowlesi/ S.purpuratus/ V.culicis/ B.rapa/ D.melanogaster/ H.salinarum/ M.perniciosa/ P.kodakaraensis/ S.reilianum/ V.dahliae/ B.subtilis/ D.mojavensis/ H.saltator/ M.phaseolina/ P.marinus/ S.sclerotiorum/ V.polyspora/ B.taurus/ D.plexippus/ H.sapiens/ M.putorius/ P.nodorum/ S.scrofa/ V.vinifera/ B.thetaiotaomicron/ D.pseudoobscura/ H.virens/ M.tuberculosis/ P.pacificus/ S.solfataricus/ W.bancrofti/ C.albicans/ D.pulex/ H.vulgare/ N.caninum/ P.pallidum/ S.stipitis/ W.ciferrii/ C.aurantiacus/ D.purpureum/ I.multifiliis/ N.castellii/ P.patens/ stderr/ W.sebi/ C.brenneri/ D.radiodurans/ I.scapularis/ N.ceranae/ P.placenta/ S.tridecemlineatus/ X.maculatus/ C.briggsae/ D.rerio/ K.africana/ N.crassa/ P.ramorum/ S.tuberosum/ X.tropicalis/ C.cinerea/ D.turgidum/ K.cryptofilum/ N.fumigata/ P.sojae/ Synechocystis.sp./ Y.lipolytica/
broken link: http://inparanoid.cgb.ki.se/download/current/sqltables/ https://github.com/Bahler-Lab/yogy/blob/master/YogiUp/run_db.csh#L40