Closed lachlancoin closed 5 years ago
I think it would be helpful if we can get both taxonomy and color from file.
I’ve tried to go through process manually connecting identifier to taxonomy and color information. And I think I'm missing some piece.
For example:
after alignment stage we have a sequence identifier gi|386331671|ref|NC_017574.1|
we can lookup speciesIndex
cat speciesIndex | grep 'gi|386331671|ref|NC_017574.1|’
and get:
Ralstonia_solanacearum >gi|386331671|ref|NC_017574.1| Ralstonia solanacearum Po82 chromosome, complete genome #a6b5f2ff
I’ve checked in commontree.txt.css
- there is Ralstonia solanacearum Po82
node. But I’m not sure how I should search for it with the information from the previous step. Should I always drop “chromosome” part from the name obtained from “speciesIndex” file? Or maybe transforming names to slug
could help in this situation?
Yes I am not sure I can solve this disconnect either unfortunatlely. I probably need to figure out how to use ncbi eutils. I can match most of the entries in speciesIndex, but not all
OK I can solve this now, except that there are a number of nodes which are unclassified (I will still improve it)
I copied a new tree to GCP:
gsutil cp gs://nano-stream1/CombinedDatabases/commontree.txt.css.mod
I am in the process of a new japsa release which will have a class for reading this file, you can see how it works here:
https://github.com/mdcao/japsa/blob/master/src/test/java/japsadev/bio/phylo/NCBITreeTest.java
ok I published a new release v1.9-3a with this functionality
Great! I think possibility to search by "NC_.." identifiers in tree is exactly what we need. I'll work on implementation to use it for coloring and hierarchy.
In the latest japsa release you can write: File commontree = new File("commontree.txt.css.mod")
NCBITree t = new NCBITree(commontree); String[][] taxa = t.getTaxonomy("NC...");
taxa[0] is the list of taxa taxa[1] is the corresponding list of css
On Thu, 14 Feb 2019 at 00:32, Alexander Bushkovsky notifications@github.com wrote:
Great! I think possibility to search by "NC_.." identifiers in tree is exactly what we need. I'll work on implementation to use it for coloring and hierarchy.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463220507, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZPbb4eLWCvciCLOD0hxTQi09ZqG0ks5vNCIOgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
OK I just copied a better tree file,
gsutil cp commontree.txt.css.mod gs://nano-stream1/CommonDatabases
almost all nodes are assinged now
On Thu, 14 Feb 2019 at 00:39, Lachlan Coin ljmcoin@gmail.com wrote:
In the latest japsa release you can write: File commontree = new File("commontree.txt.css.mod")
NCBITree t = new NCBITree(commontree); String[][] taxa = t.getTaxonomy("NC...");
taxa[0] is the list of taxa taxa[1] is the corresponding list of css
On Thu, 14 Feb 2019 at 00:32, Alexander Bushkovsky < notifications@github.com> wrote:
Great! I think possibility to search by "NC_.." identifiers in tree is exactly what we need. I'll work on implementation to use it for coloring and hierarchy.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463220507, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZPbb4eLWCvciCLOD0hxTQi09ZqG0ks5vNCIOgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
I've applied coloring scheme from commontree.txt.css.mod
file and noticed that it contains mostly very dark colors, for example:
\-Bacteria css=#000000ff
...
| | | +-Pandoraea css=#000201ff
...
Viruses css=#000000ff
It looks completely black on the visualization:
Coloring scheme was very different in the slug_bacteria.txt and slug_viral.txt files.
Yes you are right, I must have introduced a bug which decreased the lightness. IT is supposed to scale so that close to root is dark and close to tip is light
On Thu, 14 Feb 2019 at 09:37, Alexander Bushkovsky notifications@github.com wrote:
Assigned #55 https://github.com/allenday/nanostream-dataflow/issues/55 to @lachlancoin https://github.com/lachlancoin.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#event-2138256119, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHmz63vrLdf8HLBoPhvJWSYCqa0Lks5vNKGpgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
OK I found the bug (using a 0 to 1 scale when I should have been 0 to 100
The new version is copied across:
cp commontree.txt.css.mod gs://nano-stream1/CombinedDatabases
the colors seem better:
| | +-Pandoraea css=#e495e7ff height=1.00 | | | -Pandoraea pnomenusa css=#eaabecff height=0.900 | | | -Pandoraea pnomenusa 3kgm css=#efc1f1ff height=0.800 | | | | -NC_023018.1 css=#eaabecff alias=Pandoraea_RB alias1=Pandoraea sp. RB-44 genome height=0.00 | | | | -NC_022904.1 css=#eaabecff alias=Pandoraea_pnomenusa
On Thu, 14 Feb 2019 at 09:43, Lachlan Coin ljmcoin@gmail.com wrote:
Yes you are right, I must have introduced a bug which decreased the lightness. IT is supposed to scale so that close to root is dark and close to tip is light
On Thu, 14 Feb 2019 at 09:37, Alexander Bushkovsky < notifications@github.com> wrote:
Assigned #55 https://github.com/allenday/nanostream-dataflow/issues/55 to @lachlancoin https://github.com/lachlancoin.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#event-2138256119, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHmz63vrLdf8HLBoPhvJWSYCqa0Lks5vNKGpgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
I also added a new class for running all the steps to generate the CSS in japsadev.tools.makeCSS.CSSProcessCommand
The last step of this process is to get the taxonomy and colors via following doe public static void test(File treein, String[] totest){ try{ NCBITree t = new NCBITree(treein); for(int i=0; i<totest.length; i++){ String[][] taxa = t.getTaxonomy(totest[i]); LOG.info(totest[i]); LOG.info(""+Arrays.asList(taxa[0])); LOG.info(""+Arrays.asList(taxa[1])); }
}catch(Exception exc){
exc.printStackTrace();
}
}
For the input
String[] totest = "Homo sapiens:Capnocytophaga
canimorsus:Staphylococcus aureus:NC_023018.1".split(":")
the results are:
[main] INFO japsadev.tools.makeCSS.CSSProcessCommand - Homo sapiens [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [Homo sapiens, Primates, Euarchontoglires, Boreoeutheria, Sarcopterygii, Euteleostomi, Deuterostomia, Bilateria, Metazoa, Opisthokonta, Eukaryota, cellular organisms] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#bbdcf6ff,
[main] INFO japsadev.tools.makeCSS.CSSProcessCommand - Capnocytophaga canimorsus [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [Capnocytophaga canimorsus, Capnocytophaga, Flavobacteriaceae, Flavobacteriales, Flavobacteriia, Bacteroidetes, Bacteroidetes/Chlorobi group, FCB group, Bacteria, cellular organisms] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#ccfd99ff,
[main] INFO japsadev.tools.makeCSS.CSSProcessCommand - Staphylococcus aureus [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [Staphylococcus aureus, Staphylococcus, Staphylococcaceae, Bacillales, Bacilli, Firmicutes, Terrabacteria group, Bacteria, cellular organisms] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#ddcd9fff,
[main] INFO japsadev.tools.makeCSS.CSSProcessCommand - NC_023018.1 [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [NC_023018.1, Pandoraea pnomenusa 3kgm, Pandoraea pnomenusa, Pandoraea, Burkholderiaceae, Burkholderiales, Betaproteobacteria, Proteobacteria, Bacteria, cellular organisms] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#eaabecff,
On Thu, 14 Feb 2019 at 11:56, Lachlan Coin ljmcoin@gmail.com wrote:
OK I found the bug (using a 0 to 1 scale when I should have been 0 to 100
The new version is copied across:
cp commontree.txt.css.mod gs://nano-stream1/CombinedDatabases
the colors seem better:
| | +-Pandoraea css=#e495e7ff height=1.00 | | | -Pandoraea pnomenusa css=#eaabecff height=0.900 | | | -Pandoraea pnomenusa 3kgm css=#efc1f1ff height=0.800 | | | | -NC_023018.1 css=#eaabecff alias=Pandoraea_RB alias1=Pandoraea sp. RB-44 genome height=0.00 | | | | -NC_022904.1 css=#eaabecff alias=Pandoraea_pnomenusa
On Thu, 14 Feb 2019 at 09:43, Lachlan Coin ljmcoin@gmail.com wrote:
Yes you are right, I must have introduced a bug which decreased the lightness. IT is supposed to scale so that close to root is dark and close to tip is light
On Thu, 14 Feb 2019 at 09:37, Alexander Bushkovsky < notifications@github.com> wrote:
Assigned #55 https://github.com/allenday/nanostream-dataflow/issues/55 to @lachlancoin https://github.com/lachlancoin.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#event-2138256119, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHmz63vrLdf8HLBoPhvJWSYCqa0Lks5vNKGpgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
Thank you! I think bacteria hierarchy has very nice coloring now:
For now to test coloring I've just dumped name - color correspondence into javascript. We'll work on adopting Japsa methods into pipeline.
Is it possible to have the same tree structure for antibiotic resistance genes?
ok
On Thu, 14 Feb 2019, 12:27 Alexander Bushkovsky <notifications@github.com wrote:
Is it possible to have the same tree structure for antibiotic resistance genes?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463460911, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHobBmjaS00LyRplRChQ3rnFfqaPks5vNMmtgaJpZM4a2lnX .
I copied the resistance gene tree:
Copying file://CombinedDatabases/resistancetree.txt.css [Content-Type=text/css]...
On Thu, 14 Feb 2019 at 12:40, Lachlan Coin ljmcoin@gmail.com wrote:
ok
On Thu, 14 Feb 2019, 12:27 Alexander Bushkovsky <notifications@github.com wrote:
Is it possible to have the same tree structure for antibiotic resistance genes?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463460911, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHobBmjaS00LyRplRChQ3rnFfqaPks5vNMmtgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
I'm trying to integrate NCBITree, and with some references it works fine, while for other I receive empty result. For this two leaves in the tree:
| | | | | \->NC_011584.1 css=#e9c9ccff alias=East_African_Cassava_Mosaic ...
| | | | | \->AJ717516.1 css=#e9c9ccff alias=Eas```
For NC_011584.1
String[][] taxa = t.getTaxonomy("NC_011584.1");
System.out.println(""+Arrays.asList(taxa[0]));
System.out.println(""+Arrays.asList(taxa[1]));
returns as expected:
[NC_011584.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses]
[#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]
But for AJ717516.1
String[][] taxa = t.getTaxonomy("AJ717516.1");
System.out.println(""+Arrays.asList(taxa[0]));
System.out.println(""+Arrays.asList(taxa[1]));
Result is two empty lists:
[]
[]
Could you help me to debug this issue?
yes it seems that that "->" confuses my regex (for some reason the
->NC_011584.1 escapes this because there is another entry with -NC_011584.1
Anyway the very quick fix is:
sed 's/->/-/g' commontree.txt.css.mod > commontree.txt.css.mod1 mv commontree.txt.css.mod1 commontree.txt.css.mod
I have confirmed this fixes the problem and I will upload it to the bucket:
AJ717516.1 [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [AJ717516.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#e9c9ccff,
On Fri, 15 Feb 2019 at 00:27, Alexander Bushkovsky notifications@github.com wrote:
I'm trying to integrate NCBITree, and with some references it works fine, while for other I receive empty result. For this two leaves in the tree:
| | | | | ->NC_011584.1 css=#e9c9ccff alias=East_African_Cassava_Mosaic ... | | | | | ->AJ717516.1 css=#e9c9ccff alias=Eas```
For NC_011584.1
String[][] taxa = t.getTaxonomy("NC_011584.1"); System.out.println(""+Arrays.asList(taxa[0])); System.out.println(""+Arrays.asList(taxa[1]));
returns as expected:
[NC_011584.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]
But for AJ717516.1
String[][] taxa = t.getTaxonomy("AJ717516.1"); System.out.println(""+Arrays.asList(taxa[0])); System.out.println(""+Arrays.asList(taxa[1]));
Result is two empty lists:
[] []
Could you help me to debug this issue?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463646642, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZNHUnfmcKXpq3gJdHQgnizzRGbKEks5vNXJbgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
have updated the fixed file:
gsutil cp commontree.txt.css.mod gs://nano-stream1/CombinedDatabases
will modify my pipelines to detect this in future and fix it
On Fri, 15 Feb 2019 at 10:32, Lachlan Coin ljmcoin@gmail.com wrote:
yes it seems that that "->" confuses my regex (for some reason the
->NC_011584.1 escapes this because there is another entry with -NC_011584.1
Anyway the very quick fix is:
sed 's/->/-/g' commontree.txt.css.mod > commontree.txt.css.mod1 mv commontree.txt.css.mod1 commontree.txt.css.mod
I have confirmed this fixes the problem and I will upload it to the bucket:
AJ717516.1 [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [AJ717516.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]
On Fri, 15 Feb 2019 at 00:27, Alexander Bushkovsky < notifications@github.com> wrote:
I'm trying to integrate NCBITree, and with some references it works fine, while for other I receive empty result. For this two leaves in the tree:
| | | | | ->NC_011584.1 css=#e9c9ccff alias=East_African_Cassava_Mosaic ... | | | | | ->AJ717516.1 css=#e9c9ccff alias=Eas```
For NC_011584.1
String[][] taxa = t.getTaxonomy("NC_011584.1"); System.out.println(""+Arrays.asList(taxa[0])); System.out.println(""+Arrays.asList(taxa[1]));
returns as expected:
[NC_011584.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]
But for AJ717516.1
String[][] taxa = t.getTaxonomy("AJ717516.1"); System.out.println(""+Arrays.asList(taxa[0])); System.out.println(""+Arrays.asList(taxa[1]));
Result is two empty lists:
[] []
Could you help me to debug this issue?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463646642, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZNHUnfmcKXpq3gJdHQgnizzRGbKEks5vNXJbgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
sorry, i mean its fixed for now (you can use the new commontree.txt.css.mod) , but i will make sure that the pipelines are also fixed for future generation of this file
On Fri, 15 Feb 2019 at 10:33, Lachlan Coin ljmcoin@gmail.com wrote:
have updated the fixed file:
gsutil cp commontree.txt.css.mod gs://nano-stream1/CombinedDatabases
will modify my pipelines to detect this in future and fix it
On Fri, 15 Feb 2019 at 10:32, Lachlan Coin ljmcoin@gmail.com wrote:
yes it seems that that "->" confuses my regex (for some reason the
->NC_011584.1 escapes this because there is another entry with -NC_011584.1
Anyway the very quick fix is:
sed 's/->/-/g' commontree.txt.css.mod > commontree.txt.css.mod1 mv commontree.txt.css.mod1 commontree.txt.css.mod
I have confirmed this fixes the problem and I will upload it to the bucket:
AJ717516.1 [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [AJ717516.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]
On Fri, 15 Feb 2019 at 00:27, Alexander Bushkovsky < notifications@github.com> wrote:
I'm trying to integrate NCBITree, and with some references it works fine, while for other I receive empty result. For this two leaves in the tree:
| | | | | ->NC_011584.1 css=#e9c9ccff alias=East_African_Cassava_Mosaic ... | | | | | ->AJ717516.1 css=#e9c9ccff alias=Eas```
For NC_011584.1
String[][] taxa = t.getTaxonomy("NC_011584.1"); System.out.println(""+Arrays.asList(taxa[0])); System.out.println(""+Arrays.asList(taxa[1]));
returns as expected:
[NC_011584.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]
But for AJ717516.1
String[][] taxa = t.getTaxonomy("AJ717516.1"); System.out.println(""+Arrays.asList(taxa[0])); System.out.println(""+Arrays.asList(taxa[1]));
Result is two empty lists:
[] []
Could you help me to debug this issue?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463646642, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZNHUnfmcKXpq3gJdHQgnizzRGbKEks5vNXJbgaJpZM4a2lnX .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
Some updates on the coloring. I have moved the cololring scripts into main japsa (out of dev) so that it gets into the release properly. The package is now " japsa.bio.phylo;"
I have put it all in the release japsa-1.9-3c
You can call it like this:
String totest = "NC_023018.1:Homo sapiens:Capnocytophaga canimorsus:Staphylococcus aureus:NC_023018.1:NC_002645.1".split(":";
File treein = new File("commontree.txt.css.mod") NCBITree t = new NCBITree(treein, false); for(int i=0; i<totest.length; i++){ String[][] taxa = t.getTaxonomy(totest[i]); LOG.info(totest[i]); LOG.info(""+Arrays.asList(taxa[0])); //taxa list LOG.info(""+Arrays.asList(taxa[1])); //colors }
Also I have spent a bit of time make sure the trees can be built automatically from the taxdump files on ncbi. This is now working well. You can build the trees from japsa.bio.phylo.CSSProcessCommand , with input speciesIndex obtained from grep '>' genomeDB.fasta > speciesIndex as long as you have a taxdump/ directory (or symbolic link). Anyway, this need not concern you at this stage, just for your information!
Note that the trees produced this way may be very slightly different from previous tree, so if using the new code you should get new trees from
gs://nano-stream1/Databases/CombinedDatabase/commontree.txt.css.out
I have made a new release which is
The current pipeline is calling an NCBI endpoint for taxonomy information.
I have a potential alternative solution which relies on reading in the tree from a flat file.
I have created a tree in gs://nano-stream1/CombinedDatabases/commontree.txt.css Note that a tab separates the tree information from the css information in this file
I have commited a class in japsa which is: dev/java/japsadev/bio/phylo/NCBITree.java
This class can read this tree format, and could provide the required phylogenetic information as well as CSS information.
Currently this commit is on a branch (generate_css) and not in the main branch but if you want to use this instead of NCBI api we can make a new release with this functionality.