biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

Qiime Songbird Multinomial - 'Data Frame' Error #166

Closed nagr4657 closed 1 year ago

nagr4657 commented 1 year ago

Hi all,

I am running 'qiime songbird multinomial' (using ubuntu 22.04.2 LTS with Qiime2-2020.6) to eventually perform differential abundance testing. I have run into an issue where my input .qza data files sometimes work and sometimes do not. I am working on multiple projects with similar workflows, and as such have similarly generated feature tables. I installed and started by successfully running the function on one of my datasets (FMT): #1

image

I then tried running the same function using a different .qza feature table (in a different directory - acetate) and received this data frame error: #2

image

After some troubleshooting, I figured this issue had to be due to my input .qza file so as a test I re-ran my very first data set again to see if my original feature table (FMT) would still work. As you can see from the code below, I used the exact same files as in image #1 but this time instead of generating my outputs I received another error message: #3

image

Any ideas what might be causing this 'DataFrame' error? I tried using 'qiime dev refresh-cache' and that didn't seem to alleviate the problem.

Thank you!

fedarko commented 1 year ago

Hi @nagr4657,

This is interesting—my guess is that, at some point between the first and second images, something about your QIIME 2 conda environment changed. This error is a symptom of an incompatible (i.e. "too new") pandas version being used, which in turn suggests that your conda environment may be changing somehow in between these two images. Anecdotally, I've seen that sometimes conda can mess up and say that it's still in one conda environment (and then still behave, somewhat, like it's in a different environment).

If you could please provide the following data, that would help a lot with debugging this:

Also, if you are able to reproduce the first "success" (where you are able to run songbird on the FMT dataset), could you try running songbird on the second (acetate) dataset without leaving the current directory? For example, something like

qiime songbird multinomial \
    --i-table ~/acetate2023/152656_filtered_table_acetateBL1800.qza \
    [other parameters go here...]

If this succeeds, then it proves that something is going wrong when you change directories into ~/acetate2023. If it fails, then something even stranger is going on.

Thanks!

mortonjt commented 1 year ago

I'll also add that you may want to look into your input file more carefully (i.e. qiime tools peek FMT_BL_only_table.qza) to make sure that they are biom tables (not csvs or tab delimited tables).

But I agree with @fedarko it looks like something changed in your environment.

nagr4657 commented 1 year ago

Hi @fedarko,

Thank you very much for your quick response! Here are the data that you requested:

-Attempting to run sognbird again in the FMT2023 directory: image

-Outputs of the requests within the Acetate2023 directory: image

-Attempting to run songbird on the acetate feature table again but this time in the FMT2023 directory image

Thank you very much, Nathan

mortonjt commented 1 year ago

Hi @nagr4657 thank you for the update -- but don't think this is a pandas issue. My best guess is that your FMT_BL_only_table.qza is misformatted (songbird thinks that your input counts are in the biom-format when they aren't). Running the qiime tools peek FMT_BL_only_table.qza command will verify this.

nagr4657 commented 1 year ago

Hi @mortonjt,

Thank you for the suggestion, I am curious why it worked originally as I used the exact same '--i-table'?

Here is the output from running tools peek: image

Any thoughts on what is causing this issue? I am using Qiita to generate the input file: image

I tried using both the .qza and the .biom outputs from qiita (screenshot above).

When I use the .qza file for songbird the output was: image

When I use the .biom file for songbird the output is: image image

I apologize if I am missing something rather basic that is contributing to this issue.

Thank you, Nathan

mortonjt commented 1 year ago

Hi @nagr4657 got it. I'm looking at your pandas version again and realized that it may be out-of-sync with the original songbird version. See this issue for another example of this : https://github.com/biocore/songbird/issues/128

Can you try downgrading to pandas=0.25 to see if you still get this issue?

nagr4657 commented 1 year ago

Hi @mortonjt ,

Sorry for the delay - it took a really long to downgrade to pandas=0.25. I think the process was successful, but given how long it took and how much code was run I am not quite sure: image

However, when I tried to run songbird I am still getting a 'DataFrame" Error. image

Any thoughts on what else I could try? Thanks!

mortonjt commented 1 year ago

Hi - I still don't think you have the right pandas version — otherwise you would have the to_dense function.

Its hard for me to say how exactly your install is configured. Based on your previous posts, it sounds like the pandas version is the culprit. At this point, the easiest route is probably to uninstall / reinstall songbird as specified in the install instructions

On Thu, Mar 23, 2023 at 6:23 PM nagr4657 @.***> wrote:

Hi @mortonjt https://github.com/mortonjt ,

Sorry for the delay - it took a really long to downgrade to pandas=0.25. I think the process was successful, but given how long it took and how much code was run I am not quite sure: [image: image] https://user-images.githubusercontent.com/106201507/227377167-474fd897-8381-4da7-8611-ed3455139f56.png

However, when I tried to run songbird I am still getting a 'DataFrame" Error. [image: image] https://user-images.githubusercontent.com/106201507/227377368-f637a160-f91e-49e3-8fa0-c81b7077bd61.png

Any thoughts on what else I could try? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/issues/166#issuecomment-1481992973, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXJETEC2VLJLF4ZTPPDW5TEMTANCNFSM6AAAAAAWEEXVWY . You are receiving this because you were mentioned.Message ID: @.***>

nagr4657 commented 1 year ago

Hi @mortonjt

I am sorry for having to continue this discussion. I am not sure what I am doing wrong. Here is what I did:

1) uninstalled songbird: image

2) I reinstalled songbird in qiime using pip: image

3) I tried rerunning songbird multinomial using the same feature table and metadata file: image

Should I try uninstalling a different way?

Best, Nathan

fedarko commented 1 year ago

Hi @nagr4657,

Thank you for your assistance in debugging this. I think I see the problem we are currently stuck on: if you have an incompatible pandas version (you have version 1.1.5 installed), then re-installing Songbird won't change the pandas version -- this is because Songbird's setup files say that it only requires a pandas version above a certain limit (#117), even though this is false. Sorry; this is a known problem with Songbird.

Fortunately, this means that fixing the problem should be simple. I'm not sure that the earlier attempt to downgrade the pandas version worked, since I still see a pandas version of 1.1.5 listed near the top of this screenshot. To downgrade pandas in this situation, I recommend running these two commands:

pip uninstall pandas
pip install "pandas==0.25.3"

After this, you can run pip list | grep pandas to check what version of pandas is installed, like before. If the pandas re-installation process worked, you should see something like pandas 0.25.3. At this point, you should be able to use Songbird again.

If the above steps don't work for some reason, then the nuclear approach is just re-installing QIIME 2 2020.6 (creating a new conda environment). You can do this by following the same installation instructions that I assume you used before, but now just using a different conda environment name (e.g. conda env create -n qiime2-2020.6-v2 [...] instead of conda env create -n qiime2-2020.6 [...]). I'm pretty sure this should fix the problem, but hopefully the less-nuclear approach detailed above should resolve this problem without requiring you to re-install QIIME 2 :)

Let us know how this goes!

nagr4657 commented 1 year ago

Hi @fedarko and @mortonjt

Sorry for delayed response time. I tried uninstalling and reinstalling the pandas version as you suggested. I was successful in doing so, but this did not solve the issue with songbird. I went with the more nuclear approach and in the process of uninstalling and re-installing qiime2020.6 I hit quite a few hiccups that I was not able to solve until today. This issue may or may not have been related to the issues I was having previously (https://forum.qiime2.org/t/installation-of-qiime2-successful-but-cant-activate-due-to-inability-in-finding-conda-enviroment/26017/9).

Regardless, I was able to re-install qiime2-2020.6 and was able to successfully run songbird on both of my data sets today.

Thank you both very VERY much for your help troubleshooting!

Best, Nathan

fedarko commented 1 year ago

Thanks for letting us know, @nagr4657! Sorry for all the trouble; glad the issue is solved.

mestaki commented 1 year ago

Hey folks, I know this is closed and the culprit was determined to compatibility with newest pandas, but just wanted to share what I think is causing users to run into this issue. Qiime 2 2020.6 comes with pandas 0.25.3, this meets songbird's required 0.18 needs so when songbird is installed it doesn't upgrade pandas. However, qurro, which is often installed alongside songbird, does have a "pandas >= 1" requirement which forces the environment to upgrade pandas, breaking songbird in the process. Keeping qurro separate is the easiest solution here, assuming the q2 plugins for these are not going to be upgraded for newest versions?

fedarko commented 1 year ago

Thanks @mestaki! That makes sense; I apologize for the trouble. A few months ago I updated Qurro to work with the pandas versions in newer QIIME 2 environments, and this update had the unfortunate effect of making Qurro not work with older QIIME 2 environments. (The silly thing is that, before this update, we had this same problem in reverse -- installing Qurro into new QIIME 2 environments would break those also ._.)

I think it might be possible to adjust Qurro's code to repeatedly detect which version of pandas is installed and do different things accordingly, but I don't have time to overhaul it that extensively now. The "ideal" solution would probably be updating Songbird to work with newer pandas versions, but as I understand it recent efforts have been focused more on BIRDMAn.

Keeping qurro separate is the easiest solution here, assuming the q2 plugins for these are not going to be upgraded for newest versions?

Given an old QIIME 2 environment (e.g. 2020.6) into which Songbird has been installed, I think an even easier way to use Qurro is to install a slightly old version of it (v0.7.1) that expects older pandas versions:

Using pip Using conda
pip install "qurro==0.7.1" conda install -c conda-forge "qurro=0.7.1"

This way, we avoid the need to create a new separate conda environment just for Qurro. (Although doing that would also work.)

This slightly-old version of Qurro, v0.7.1, is basically the same as the latest version (v0.8.0) -- the main difference between the two was the adjustment in v0.8.0 to work with newer pandas versions.