allenday / nanostream-dataflow

real-time stream processing of DNA nanopore sequencer reads with dataflow
MIT License
27 stars 9 forks source link

NCBI endpoint #55

Closed lachlancoin closed 5 years ago

lachlancoin commented 5 years ago

The current pipeline is calling an NCBI endpoint for taxonomy information.

I have a potential alternative solution which relies on reading in the tree from a flat file.

I have created a tree in gs://nano-stream1/CombinedDatabases/commontree.txt.css Note that a tab separates the tree information from the css information in this file

I have commited a class in japsa which is: dev/java/japsadev/bio/phylo/NCBITree.java

This class can read this tree format, and could provide the required phylogenetic information as well as CSS information.

Currently this commit is on a branch (generate_css) and not in the main branch but if you want to use this instead of NCBI api we can make a new release with this functionality.

obsh commented 5 years ago

I think it would be helpful if we can get both taxonomy and color from file.

I’ve tried to go through process manually connecting identifier to taxonomy and color information. And I think I'm missing some piece.

For example: after alignment stage we have a sequence identifier gi|386331671|ref|NC_017574.1| we can lookup speciesIndex

cat speciesIndex | grep 'gi|386331671|ref|NC_017574.1|’

and get:

Ralstonia_solanacearum >gi|386331671|ref|NC_017574.1| Ralstonia solanacearum Po82 chromosome, complete genome #a6b5f2ff

I’ve checked in commontree.txt.css - there is Ralstonia solanacearum Po82 node. But I’m not sure how I should search for it with the information from the previous step. Should I always drop “chromosome” part from the name obtained from “speciesIndex” file? Or maybe transforming names to slug could help in this situation?

lachlancoin commented 5 years ago

Yes I am not sure I can solve this disconnect either unfortunatlely. I probably need to figure out how to use ncbi eutils. I can match most of the entries in speciesIndex, but not all

lachlancoin commented 5 years ago

OK I can solve this now, except that there are a number of nodes which are unclassified (I will still improve it)

I copied a new tree to GCP:

gsutil cp gs://nano-stream1/CombinedDatabases/commontree.txt.css.mod

I am in the process of a new japsa release which will have a class for reading this file, you can see how it works here:

https://github.com/mdcao/japsa/blob/master/src/test/java/japsadev/bio/phylo/NCBITreeTest.java

lachlancoin commented 5 years ago

ok I published a new release v1.9-3a with this functionality

obsh commented 5 years ago

Great! I think possibility to search by "NC_.." identifiers in tree is exactly what we need. I'll work on implementation to use it for coloring and hierarchy.

lachlancoin commented 5 years ago

In the latest japsa release you can write: File commontree = new File("commontree.txt.css.mod")

NCBITree t = new NCBITree(commontree); String[][] taxa = t.getTaxonomy("NC...");

taxa[0] is the list of taxa taxa[1] is the corresponding list of css

On Thu, 14 Feb 2019 at 00:32, Alexander Bushkovsky notifications@github.com wrote:

Great! I think possibility to search by "NC_.." identifiers in tree is exactly what we need. I'll work on implementation to use it for coloring and hierarchy.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463220507, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZPbb4eLWCvciCLOD0hxTQi09ZqG0ks5vNCIOgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

lachlancoin commented 5 years ago

OK I just copied a better tree file,

gsutil cp commontree.txt.css.mod gs://nano-stream1/CommonDatabases

almost all nodes are assinged now

On Thu, 14 Feb 2019 at 00:39, Lachlan Coin ljmcoin@gmail.com wrote:

In the latest japsa release you can write: File commontree = new File("commontree.txt.css.mod")

NCBITree t = new NCBITree(commontree); String[][] taxa = t.getTaxonomy("NC...");

taxa[0] is the list of taxa taxa[1] is the corresponding list of css

On Thu, 14 Feb 2019 at 00:32, Alexander Bushkovsky < notifications@github.com> wrote:

Great! I think possibility to search by "NC_.." identifiers in tree is exactly what we need. I'll work on implementation to use it for coloring and hierarchy.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463220507, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZPbb4eLWCvciCLOD0hxTQi09ZqG0ks5vNCIOgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

obsh commented 5 years ago

I've applied coloring scheme from commontree.txt.css.mod file and noticed that it contains mostly very dark colors, for example:

\-Bacteria      css=#000000ff
...
    | | | +-Pandoraea   css=#000201ff
...
Viruses css=#000000ff

It looks completely black on the visualization: image

Coloring scheme was very different in the slug_bacteria.txt and slug_viral.txt files.

lachlancoin commented 5 years ago

Yes you are right, I must have introduced a bug which decreased the lightness. IT is supposed to scale so that close to root is dark and close to tip is light

On Thu, 14 Feb 2019 at 09:37, Alexander Bushkovsky notifications@github.com wrote:

Assigned #55 https://github.com/allenday/nanostream-dataflow/issues/55 to @lachlancoin https://github.com/lachlancoin.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#event-2138256119, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHmz63vrLdf8HLBoPhvJWSYCqa0Lks5vNKGpgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

lachlancoin commented 5 years ago

OK I found the bug (using a 0 to 1 scale when I should have been 0 to 100

The new version is copied across:

cp commontree.txt.css.mod gs://nano-stream1/CombinedDatabases

the colors seem better:

| | +-Pandoraea css=#e495e7ff height=1.00 | | | -Pandoraea pnomenusa css=#eaabecff height=0.900 | | | -Pandoraea pnomenusa 3kgm css=#efc1f1ff height=0.800 | | | | -NC_023018.1 css=#eaabecff alias=Pandoraea_RB alias1=Pandoraea sp. RB-44 genome height=0.00 | | | | -NC_022904.1 css=#eaabecff alias=Pandoraea_pnomenusa

On Thu, 14 Feb 2019 at 09:43, Lachlan Coin ljmcoin@gmail.com wrote:

Yes you are right, I must have introduced a bug which decreased the lightness. IT is supposed to scale so that close to root is dark and close to tip is light

On Thu, 14 Feb 2019 at 09:37, Alexander Bushkovsky < notifications@github.com> wrote:

Assigned #55 https://github.com/allenday/nanostream-dataflow/issues/55 to @lachlancoin https://github.com/lachlancoin.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#event-2138256119, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHmz63vrLdf8HLBoPhvJWSYCqa0Lks5vNKGpgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

lachlancoin commented 5 years ago

I also added a new class for running all the steps to generate the CSS in japsadev.tools.makeCSS.CSSProcessCommand

The last step of this process is to get the taxonomy and colors via following doe public static void test(File treein, String[] totest){ try{ NCBITree t = new NCBITree(treein); for(int i=0; i<totest.length; i++){ String[][] taxa = t.getTaxonomy(totest[i]); LOG.info(totest[i]); LOG.info(""+Arrays.asList(taxa[0])); LOG.info(""+Arrays.asList(taxa[1])); }

    }catch(Exception exc){
        exc.printStackTrace();
    }
}

For the input

      String[] totest = "Homo sapiens:Capnocytophaga

canimorsus:Staphylococcus aureus:NC_023018.1".split(":")

the results are:

[main] INFO japsadev.tools.makeCSS.CSSProcessCommand - Homo sapiens [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [Homo sapiens, Primates, Euarchontoglires, Boreoeutheria, Sarcopterygii, Euteleostomi, Deuterostomia, Bilateria, Metazoa, Opisthokonta, Eukaryota, cellular organisms] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#bbdcf6ff,

a3d0f3ff, #8bc4f0ff, #73b7edff, #5babeaff, #1877c0ff, #1568a8ff,

0c3d60ff, #092e48ff, #061f30ff, #030d18ff, #000000ff]

[main] INFO japsadev.tools.makeCSS.CSSProcessCommand - Capnocytophaga canimorsus [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [Capnocytophaga canimorsus, Capnocytophaga, Flavobacteriaceae, Flavobacteriales, Flavobacteriia, Bacteroidetes, Bacteroidetes/Chlorobi group, FCB group, Bacteria, cellular organisms] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#ccfd99ff,

bffd7eff, #affd64ff, #9afc49ff, #8bfb2fff, #64fa15ff, #48ed06ff,

28d207ff, #0db607ff, #000000ff]

[main] INFO japsadev.tools.makeCSS.CSSProcessCommand - Staphylococcus aureus [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [Staphylococcus aureus, Staphylococcus, Staphylococcaceae, Bacillales, Bacilli, Firmicutes, Terrabacteria group, Bacteria, cellular organisms] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#ddcd9fff,

d6c48bff, #cfbc76ff, #c1a84eff, #b38e41ff, #9ba335ff, #34f63bff,

0db607ff, #000000ff]

[main] INFO japsadev.tools.makeCSS.CSSProcessCommand - NC_023018.1 [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [NC_023018.1, Pandoraea pnomenusa 3kgm, Pandoraea pnomenusa, Pandoraea, Burkholderiaceae, Burkholderiales, Betaproteobacteria, Proteobacteria, Bacteria, cellular organisms] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#eaabecff,

efc1f1ff, #eaabecff, #e495e7ff, #dc69dcff, #c251d9ff, #5934dbff,

1c1ba3ff, #0db607ff, #000000ff]

On Thu, 14 Feb 2019 at 11:56, Lachlan Coin ljmcoin@gmail.com wrote:

OK I found the bug (using a 0 to 1 scale when I should have been 0 to 100

The new version is copied across:

cp commontree.txt.css.mod gs://nano-stream1/CombinedDatabases

the colors seem better:

| | +-Pandoraea css=#e495e7ff height=1.00 | | | -Pandoraea pnomenusa css=#eaabecff height=0.900 | | | -Pandoraea pnomenusa 3kgm css=#efc1f1ff height=0.800 | | | | -NC_023018.1 css=#eaabecff alias=Pandoraea_RB alias1=Pandoraea sp. RB-44 genome height=0.00 | | | | -NC_022904.1 css=#eaabecff alias=Pandoraea_pnomenusa

On Thu, 14 Feb 2019 at 09:43, Lachlan Coin ljmcoin@gmail.com wrote:

Yes you are right, I must have introduced a bug which decreased the lightness. IT is supposed to scale so that close to root is dark and close to tip is light

On Thu, 14 Feb 2019 at 09:37, Alexander Bushkovsky < notifications@github.com> wrote:

Assigned #55 https://github.com/allenday/nanostream-dataflow/issues/55 to @lachlancoin https://github.com/lachlancoin.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#event-2138256119, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHmz63vrLdf8HLBoPhvJWSYCqa0Lks5vNKGpgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

obsh commented 5 years ago

Thank you! I think bacteria hierarchy has very nice coloring now:

image

For now to test coloring I've just dumped name - color correspondence into javascript. We'll work on adopting Japsa methods into pipeline.

obsh commented 5 years ago

Is it possible to have the same tree structure for antibiotic resistance genes?

lachlancoin commented 5 years ago

ok

On Thu, 14 Feb 2019, 12:27 Alexander Bushkovsky <notifications@github.com wrote:

Is it possible to have the same tree structure for antibiotic resistance genes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463460911, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHobBmjaS00LyRplRChQ3rnFfqaPks5vNMmtgaJpZM4a2lnX .

lachlancoin commented 5 years ago

I copied the resistance gene tree:

Copying file://CombinedDatabases/resistancetree.txt.css [Content-Type=text/css]...

On Thu, 14 Feb 2019 at 12:40, Lachlan Coin ljmcoin@gmail.com wrote:

ok

On Thu, 14 Feb 2019, 12:27 Alexander Bushkovsky <notifications@github.com wrote:

Is it possible to have the same tree structure for antibiotic resistance genes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463460911, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZHobBmjaS00LyRplRChQ3rnFfqaPks5vNMmtgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

obsh commented 5 years ago

I'm trying to integrate NCBITree, and with some references it works fine, while for other I receive empty result. For this two leaves in the tree:

| | | | | \->NC_011584.1       css=#e9c9ccff   alias=East_African_Cassava_Mosaic ...
| | | | | \->AJ717516.1        css=#e9c9ccff   alias=Eas```

For NC_011584.1

    String[][] taxa = t.getTaxonomy("NC_011584.1");
    System.out.println(""+Arrays.asList(taxa[0]));
    System.out.println(""+Arrays.asList(taxa[1]));

returns as expected:

[NC_011584.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses]
[#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]

But for AJ717516.1

    String[][] taxa = t.getTaxonomy("AJ717516.1");
    System.out.println(""+Arrays.asList(taxa[0]));
    System.out.println(""+Arrays.asList(taxa[1]));

Result is two empty lists:

[]
[]

Could you help me to debug this issue?

lachlancoin commented 5 years ago

yes it seems that that "->" confuses my regex (for some reason the

->NC_011584.1 escapes this because there is another entry with -NC_011584.1

Anyway the very quick fix is:

sed 's/->/-/g' commontree.txt.css.mod > commontree.txt.css.mod1 mv commontree.txt.css.mod1 commontree.txt.css.mod

I have confirmed this fixes the problem and I will upload it to the bucket:

AJ717516.1 [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [AJ717516.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#e9c9ccff,

e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]

On Fri, 15 Feb 2019 at 00:27, Alexander Bushkovsky notifications@github.com wrote:

I'm trying to integrate NCBITree, and with some references it works fine, while for other I receive empty result. For this two leaves in the tree:

| | | | | ->NC_011584.1 css=#e9c9ccff alias=East_African_Cassava_Mosaic ... | | | | | ->AJ717516.1 css=#e9c9ccff alias=Eas```

For NC_011584.1

String[][] taxa = t.getTaxonomy("NC_011584.1");
System.out.println(""+Arrays.asList(taxa[0]));
System.out.println(""+Arrays.asList(taxa[1]));

returns as expected:

[NC_011584.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]

But for AJ717516.1

String[][] taxa = t.getTaxonomy("AJ717516.1");
System.out.println(""+Arrays.asList(taxa[0]));
System.out.println(""+Arrays.asList(taxa[1]));

Result is two empty lists:

[] []

Could you help me to debug this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463646642, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZNHUnfmcKXpq3gJdHQgnizzRGbKEks5vNXJbgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

lachlancoin commented 5 years ago

have updated the fixed file:

gsutil cp commontree.txt.css.mod gs://nano-stream1/CombinedDatabases

will modify my pipelines to detect this in future and fix it

On Fri, 15 Feb 2019 at 10:32, Lachlan Coin ljmcoin@gmail.com wrote:

yes it seems that that "->" confuses my regex (for some reason the

->NC_011584.1 escapes this because there is another entry with -NC_011584.1

Anyway the very quick fix is:

sed 's/->/-/g' commontree.txt.css.mod > commontree.txt.css.mod1 mv commontree.txt.css.mod1 commontree.txt.css.mod

I have confirmed this fixes the problem and I will upload it to the bucket:

AJ717516.1 [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [AJ717516.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]

On Fri, 15 Feb 2019 at 00:27, Alexander Bushkovsky < notifications@github.com> wrote:

I'm trying to integrate NCBITree, and with some references it works fine, while for other I receive empty result. For this two leaves in the tree:

| | | | | ->NC_011584.1 css=#e9c9ccff alias=East_African_Cassava_Mosaic ... | | | | | ->AJ717516.1 css=#e9c9ccff alias=Eas```

For NC_011584.1

String[][] taxa = t.getTaxonomy("NC_011584.1");
System.out.println(""+Arrays.asList(taxa[0]));
System.out.println(""+Arrays.asList(taxa[1]));

returns as expected:

[NC_011584.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]

But for AJ717516.1

String[][] taxa = t.getTaxonomy("AJ717516.1");
System.out.println(""+Arrays.asList(taxa[0]));
System.out.println(""+Arrays.asList(taxa[1]));

Result is two empty lists:

[] []

Could you help me to debug this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463646642, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZNHUnfmcKXpq3gJdHQgnizzRGbKEks5vNXJbgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

lachlancoin commented 5 years ago

sorry, i mean its fixed for now (you can use the new commontree.txt.css.mod) , but i will make sure that the pipelines are also fixed for future generation of this file

On Fri, 15 Feb 2019 at 10:33, Lachlan Coin ljmcoin@gmail.com wrote:

have updated the fixed file:

gsutil cp commontree.txt.css.mod gs://nano-stream1/CombinedDatabases

will modify my pipelines to detect this in future and fix it

On Fri, 15 Feb 2019 at 10:32, Lachlan Coin ljmcoin@gmail.com wrote:

yes it seems that that "->" confuses my regex (for some reason the

->NC_011584.1 escapes this because there is another entry with -NC_011584.1

Anyway the very quick fix is:

sed 's/->/-/g' commontree.txt.css.mod > commontree.txt.css.mod1 mv commontree.txt.css.mod1 commontree.txt.css.mod

I have confirmed this fixes the problem and I will upload it to the bucket:

AJ717516.1 [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [AJ717516.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [main] INFO japsadev.tools.makeCSS.CSSProcessCommand - [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]

On Fri, 15 Feb 2019 at 00:27, Alexander Bushkovsky < notifications@github.com> wrote:

I'm trying to integrate NCBITree, and with some references it works fine, while for other I receive empty result. For this two leaves in the tree:

| | | | | ->NC_011584.1 css=#e9c9ccff alias=East_African_Cassava_Mosaic ... | | | | | ->AJ717516.1 css=#e9c9ccff alias=Eas```

For NC_011584.1

String[][] taxa = t.getTaxonomy("NC_011584.1");
System.out.println(""+Arrays.asList(taxa[0]));
System.out.println(""+Arrays.asList(taxa[1]));

returns as expected:

[NC_011584.1, East African cassava mosaic virus TZ_19, East African cassava mosaic virus, Begomovirus, Geminiviridae, ssDNA viruses, Viruses] [#e9c9ccff, #e9c9ccff, #deafb4ff, #c97c8cff, #c16085ff, #a03984ff, #000000ff]

But for AJ717516.1

String[][] taxa = t.getTaxonomy("AJ717516.1");
System.out.println(""+Arrays.asList(taxa[0]));
System.out.println(""+Arrays.asList(taxa[1]));

Result is two empty lists:

[] []

Could you help me to debug this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/55#issuecomment-463646642, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZNHUnfmcKXpq3gJdHQgnizzRGbKEks5vNXJbgaJpZM4a2lnX .

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X

lachlancoin commented 5 years ago

Some updates on the coloring. I have moved the cololring scripts into main japsa (out of dev) so that it gets into the release properly. The package is now " japsa.bio.phylo;"

I have put it all in the release japsa-1.9-3c

You can call it like this:

          String totest = "NC_023018.1:Homo sapiens:Capnocytophaga canimorsus:Staphylococcus aureus:NC_023018.1:NC_002645.1".split(":";

File treein = new File("commontree.txt.css.mod") NCBITree t = new NCBITree(treein, false); for(int i=0; i<totest.length; i++){ String[][] taxa = t.getTaxonomy(totest[i]); LOG.info(totest[i]); LOG.info(""+Arrays.asList(taxa[0])); //taxa list LOG.info(""+Arrays.asList(taxa[1])); //colors }

Also I have spent a bit of time make sure the trees can be built automatically from the taxdump files on ncbi. This is now working well. You can build the trees from japsa.bio.phylo.CSSProcessCommand , with input speciesIndex obtained from grep '>' genomeDB.fasta > speciesIndex as long as you have a taxdump/ directory (or symbolic link). Anyway, this need not concern you at this stage, just for your information!

Note that the trees produced this way may be very slightly different from previous tree, so if using the new code you should get new trees from

gs://nano-stream1/Databases/CombinedDatabase/commontree.txt.css.out

I have made a new release which is