Error with the "Simple Hypothesis Testing" in "Example Pipeline"

firasmidani / amiga

Analysis of Microbial Growth Assays

https://firasmidani.github.io/amiga/

GNU General Public License v3.0

12 stars 3 forks source link

Error with the "Simple Hypothesis Testing" in "Example Pipeline" #6

Closed BinhongLiu closed 2 years ago

BinhongLiu commented 2 years ago

Hi, Great tool, thank you! I tested the "Example Pipeline" (https://firasmidani.github.io/amiga/doc/example.html) with the randomii dataset (https://github.com/firasmidani/amiga/tree/master/examples/randomii) and it ran well until the "Simple Hypothesis Testing" part. Could you help me with this error? Many thanks!

BinhongLiu commented 2 years ago

The AMiGA was installed in a virtual environment with conda in a linux system. Here is the packages installed in the AMiGA environment.

BinhongLiu commented 2 years ago

I tried my own data 0121.txt. 0121.txt These growth curves:

There were 2 groups, control group and inulin group, with each group having 3 replicated wells. Control: well B4, C4, and D4. Inulin group: E4, F4, and G4.

When I fitted the data, no results were obtained for well E4 and F4, whose growth curves looked normal. Is there anything wrong with the dataset? code: amiga fit -i data/0121.txt --merge-summary -o "0121" --verbose

I tried to plot the fitting curves of control and inulin groups to show their differences like this: Could I obtain the data for the bold lines in the figure? Cause I want to plot the fitting curve in R with ggplot2. My understanding is, one well produces one fitting curve, and the mean with 95% confidence interval of three replicated wells from the two groups could be plotted.

firasmidani commented 2 years ago

This first issue is due to a bug in a recent update to AMiGA. I fixed the issue and updated the repository.

Please re-install AMiGA and test again. If you cloned the repository, you can simply git pull, otherwise, just simply replace the old amiga folder with the new one. Let me know if that fixes the problem.

Hi, Great tool, thank you! I tested the "Example Pipeline" (https://firasmidani.github.io/amiga/doc/example.html) with the randomii dataset (https://github.com/firasmidani/amiga/tree/master/examples/randomii) and it ran well until the "Simple Hypothesis Testing" part. Could you help me with this error? Many thanks!

firasmidani commented 2 years ago

I ran your data and I was able to generate results for all six wells.

The fit function should generate a primary file in the summary folder with the suffix "_summary.txt". This file has the growth parameters for all samples. If your growth curves also show multi-phasic shifts, AMiGA may also generate an additional file in the summary folder with the suffix "_diauxie.txt". This file will characterize the growth parameters for each unique phase in each growth curve. In your case, E4 and F4 did not show any multi-phasic changes so they were not included in the diauxie file. Your other wells shows interesting changes in growth rate. For example, B4 has three phases between 0 and 3 hours, 3 hours and 6 hours, and it continues to grow a bit at a slower rate after 6 hours.

You can find more information on how AMiGA detects different phases here Detect Diauxie. Read this to find out how to also adjust the thresholds used by AMiGA to determine whether a shift in growth or growth rate should be recorded. The default behavior is quite liberal so it will record a lot of changes.

I tried my own data 0121.txt. 0121.txt These growth curves:

There were 2 groups, control group and inulin group, with each group having 3 replicated wells. Control: well B4, C4, and D4. Inulin group: E4, F4, and G4.

When I fitted the data, no results were obtained for well E4 and F4, whose growth curves looked normal. Is there anything wrong with the dataset? code: amiga fit -i data/0121.txt --merge-summary -o "0121" --verbose

firasmidani commented 2 years ago

I'm glad to see that you are interested in this function of AMiGA!

First, this is very important: AMiGA has a special use for Group and Control columns. Currently, it only handles these columns properly if they are values as integers. I will update AMiGA shortly so it can handle these columns if their values are strings or text as you show in your example mapping file.

Regarding your question: At the moment, the test function does not save the predicted models for the two groups being compared. If you look in the model folder inside your working directory, you will find all of the data that the test function saves including the input data and the statistical results.

However, you can still get the predicted growth curves shown in the plot in another way using the fit function with pooling. Here is how.

Using your data as an example, I created a mapping file.

Plate_ID	Isolate	Substrate	Replicate
B4	0121	DA816	Control	1
C4	0121	DA816	Control	2
D4	0121	DA816	Control	3
E4	0121	DA816	Inulin	1
F4	0121	DA816	Inulin	2
G4	0121	DA816	Inulin	3

Of course, you can analyze each sample individually as follows

amiga fit -i data/0121.txt -o "0121_individual" --merge-summary

You can also save the predicted fit for each growth curve by adding the --save-gp-data

amiga fit -i data/0121.txt -o "0121_individual" --merge-summary --save-gp-data

The --save-gp-data argument will force AMiGA to save a file in the derived folder with the model fit for all growth curves.

In your case, you are interested in testing growth differs by Substrate (or Group).

amiga test -i data/0121.txt -y "H0:Time;H1:Time+Substrate"  -o "test_substrate" -np 0 --verbose

This shows a clear difference between your data.

But to get the actual predicted fits, you can do the following

amiga fit -i data/0121.txt -o "0121_pooled" --pool-by "Substrate" --save-gp-data

To get the confidence intervals for these predicted curves, you can use the following:

amiga get_confidence -i ./derived/0121_pooled_gp_data.txt --type "Curves" --confidence 95

In the new file ./derived/0121_pooled_gp_data_confidence.txt, you will find columns for the mean (mu), the lower (low), and upper (upp) bounds.

For more details about these, see the following pages of documentation.

Pool Replicates Estimate Confidence Command Line Interface describes the output of --save-gp-data

I hope this clears up any confusion.

I tried to plot the fitting curves of control and inulin groups to show their differences like this: Could I obtain the data for the bold lines in the figure? Cause I want to plot the fitting curve in R with ggplot2. My understanding is, one well produces one fitting curve, and the mean with 95% confidence interval of three replicated wells from the two groups could be plotted.

BinhongLiu commented 2 years ago

This first issue is due to a bug in a recent update to AMiGA. I fixed the issue and updated the repository.

Please re-install AMiGA and test again. If you cloned the repository, you can simply git pull, otherwise, just simply replace the old amiga folder with the new one. Let me know if that fixes the problem.

Hi, Great tool, thank you! I tested the "Example Pipeline" (https://firasmidani.github.io/amiga/doc/example.html) with the randomii dataset (https://github.com/firasmidani/amiga/tree/master/examples/randomii) and it ran well until the "Simple Hypothesis Testing" part. Could you help me with this error? Many thanks!

Yes. Now it works well! Thanks!

This first issue is due to a bug in a recent update to AMiGA. I fixed the issue and updated the repository.

Please re-install AMiGA and test again. If you cloned the repository, you can simply git pull, otherwise, just simply replace the old amiga folder with the new one. Let me know if that fixes the problem.

Hi, Great tool, thank you! I tested the "Example Pipeline" (https://firasmidani.github.io/amiga/doc/example.html) with the randomii dataset (https://github.com/firasmidani/amiga/tree/master/examples/randomii) and it ran well until the "Simple Hypothesis Testing" part. Could you help me with this error? Many thanks!

BinhongLiu commented 2 years ago

This first issue is due to a bug in a recent update to AMiGA. I fixed the issue and updated the repository.

Please re-install AMiGA and test again. If you cloned the repository, you can simply git pull, otherwise, just simply replace the old amiga folder with the new one. Let me know if that fixes the problem.

Hi, Great tool, thank you! I tested the "Example Pipeline" (https://firasmidani.github.io/amiga/doc/example.html) with the randomii dataset (https://github.com/firasmidani/amiga/tree/master/examples/randomii) and it ran well until the "Simple Hypothesis Testing" part. Could you help me with this error? Many thanks!

Yes. Now it works well! Thanks!

BinhongLiu commented 2 years ago

I ran your data and I was able to generate results for all six wells.

The fit function should generate a primary file in the summary folder with the suffix "_summary.txt". This file has the growth parameters for all samples. If your growth curves also show multi-phasic shifts, AMiGA may also generate an additional file in the summary folder with the suffix "_diauxie.txt". This file will characterize the growth parameters for each unique phase in each growth curve. In your case, E4 and F4 did not show any multi-phasic changes so they were not included in the diauxie file. Your other wells shows interesting changes in growth rate. For example, B4 has three phases between 0 and 3 hours, 3 hours and 6 hours, and it continues to grow a bit at a slower rate after 6 hours.

You can find more information on how AMiGA detects different phases here Detect Diauxie. Read this to find out how to also adjust the thresholds used by AMiGA to determine whether a shift in growth or growth rate should be recorded. The default behavior is quite liberal so it will record a lot of changes.

I tried my own data 0121.txt. 0121.txt These growth curves: There were 2 groups, control group and inulin group, with each group having 3 replicated wells. Control: well B4, C4, and D4. Inulin group: E4, F4, and G4. When I fitted the data, no results were obtained for well E4 and F4, whose growth curves looked normal. Is there anything wrong with the dataset? code: amiga fit -i data/0121.txt --merge-summary -o "0121" --verbose

Great! I think I need to adjust the thresholds used by AMiGA!

BinhongLiu commented 2 years ago

I'm glad to see that you are interested in this function of AMiGA!

First, this is very important: AMiGA has a special use for Group and Control columns. Currently, it only handles these columns properly if they are values as integers. I will update AMiGA shortly so it can handle these columns if their values are strings or text as you show in your example mapping file.

Regarding your question: At the moment, the test function does not save the predicted models for the two groups being compared. If you look in the model folder inside your working directory, you will find all of the data that the test function saves including the input data and the statistical results.

However, you can still get the predicted growth curves shown in the plot in another way using the fit function with pooling. Here is how.

Using your data as an example, I created a mapping file.

Plate_ID Isolate Substrate Replicate B4 0121 DA816 Control 1 C4 0121 DA816 Control 2 D4 0121 DA816 Control 3 E4 0121 DA816 Inulin 1 F4 0121 DA816 Inulin 2 G4 0121 DA816 Inulin 3 Of course, you can analyze each sample individually as follows
amiga fit -i data/0121.txt -o "0121_individual" --merge-summary
You can also save the predicted fit for each growth curve by adding the --save-gp-data
amiga fit -i data/0121.txt -o "0121_individual" --merge-summary --save-gp-data
The --save-gp-data argument will force AMiGA to save a file in the derived folder with the model fit for all growth curves.

In your case, you are interested in testing growth differs by Substrate (or Group).
amiga test -i data/0121.txt -y "H0:Time;H1:Time+Substrate"  -o "test_substrate" -np 0 --verbose
This shows a clear difference between your data.

But to get the actual predicted fits, you can do the following
amiga fit -i data/0121.txt -o "0121_pooled" --pool-by "Substrate" --save-gp-data
To get the confidence intervals for these predicted curves, you can use the following:
amiga get_confidence -i ./derived/0121_pooled_gp_data.txt --type "Curves" --confidence 95
In the new file ./derived/0121_pooled_gp_data_confidence.txt, you will find columns for the mean (mu), the lower (low), and upper (upp) bounds.

For more details about these, see the following pages of documentation.

Pool Replicates Estimate Confidence Command Line Interface describes the output of --save-gp-data

I hope this clears up any confusion.

I tried to plot the fitting curves of control and inulin groups to show their differences like this: Could I obtain the data for the bold lines in the figure? Cause I want to plot the fitting curve in R with ggplot2. My understanding is, one well produces one fitting curve, and the mean with 95% confidence interval of three replicated wells from the two groups could be plotted.

Cool! That's so nice of you! This really helps me a lot and makes it much clear! Thank you so much!

BinhongLiu commented 2 years ago

Hi! Just one more question. Now based on the log Bayes Factor I know that the growth of two species is different, but I could not know whether this difference is specifically caused by growth rate or by carrying capacity. I just wonder if AMiGA could provide some standard errors or standard deviations when fitting these growth parameters, and so that we could test the significance of these specific growth parameters between the two species. Thanks!

firasmidani commented 2 years ago

You can already do this with AMiGA and it has been described in the Compare Parameters for use with the fit function. For the test function, make sure to include the --sample-posterior argument which will ask AMiGA to generate the summary statistics for the growth parameters as well. Using your example,

amiga test -i data/0121.txt -y "H0:Time;H1:Time+Substrate"  -o "test_substrate" -np 0 --sample-posterior --verbose

You will find the results in the model/test_substrate/test_substrate_params.txt file.

By the way, if you have multiple unrelated issues/requests, I am always glad to help but please submit an individual ticket for each issue. It would make it easier to me to organize and refer back to these issues. Thank you.