PayneLab / cptac

Python packaging for CPTAC data
Other
85 stars 22 forks source link

no attribute error #7

Closed a00101 closed 4 years ago

a00101 commented 4 years ago

Dear Developer. I got error below. AttributeError: module 'cptac' has no attribute 'list_datasets' cptac == 0.8.0

please help me out.

Thanks.

caleb-lindgren commented 4 years ago

Thanks for reaching out! I took a quick look, and the cptac.list_datasets function is still there in the most recent release, so I'm not sure why you'd be getting that error. Could you show me the code that generated this error, so I can understand better what's going on?

a00101 commented 4 years ago

It's my all code

import cptac
cptac.list_datasets()
Traceback (most recent call last):
  File "C:\Data\04.Programs\Sublime Text Build 3200 x64\BOLUS\cptac.py", line 1, in <module>
    import cptac
  File "C:\Data\04.Programs\Sublime Text Build 3200 x64\BOLUS\cptac.py", line 2, in <module>
    cptac.list_datasets()
AttributeError: module 'cptac' has no attribute 'list_datasets'
[Finished in 0.1s with exit code 1]
[shell_cmd: python -u "C:\Data\04.Programs\Sublime Text Build 3200 x64\BOLUS\cptac.py"]
[dir: C:\Data\04.Programs\Sublime Text Build 3200 x64\BOLUS]
[path: C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Rtools\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Users\ddhb\AppData\Local\Programs\Python\Python37-32;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Bandizip\;C:\WNS Files\OfficeDOC;C:\Users\ddhb\AppData\Local\Programs\Python\Python37-32\Scripts]
caleb-lindgren commented 4 years ago

What is the output if you enter cptac.version() in the Python prompt after importing cptac? And, from you shell (not the Python prompt), what is the output from the pip list command?

a00101 commented 4 years ago

Thank you for your rapid reply. But I give you some bad news.

import cptac
print(cptac.version())

Traceback (most recent call last):
  File "C:\Data\04.Programs\Sublime Text Build 3200 x64\BOLUS\cptac.py", line 1, in <module>
    import cptac
  File "C:\Data\04.Programs\Sublime Text Build 3200 x64\BOLUS\cptac.py", line 2, in <module>
    print(cptac.version())
AttributeError: module 'cptac' has no attribute 'version'

pip list = cptac 0.8.0

a00101 commented 4 years ago

this error depends on OS specifically WINDOWS. In the linux os, it works well.

But I got another error.

cptac.download(dataset="Endometrial") cptac error: Insufficient internet. Check your internet connection. (, line 1)

caleb-lindgren commented 4 years ago

Alright, I think I figured out what's causing the "no attribute" error. I noticed in your stacktrace that the file you're coding in is called cptac.py. The stack trace says it's located at C:\Data\04.Programs\Sublime Text Build 3200 x64\BOLUS\cptac.py. I think what's happening is that when you say import cptac, Python first looks in your current directory and sees the file called cptac.py, so it thinks you're trying to import that file, instead of importing the installed cptac package. Then, instead of having a reference to the actual package, you just have a reference to your local script. This behavior has been brought up before (see here for example), and though it can be confusing, it is an intentional feature of the Python import system.

I tried this on my own computer, by creating a file called cptac.py that tries to import and use the cptac package, and I got the same error you did. Try changing the name of your file to something other than cptac.py, such as cptac_test.py and that should fix it.

The second error sometimes arises if your WiFi signal is weak. If a download fails multiple times due to a spotty connection, it will just cancel it. Do you think you could try finding a place with a stronger WiFi signal?

a00101 commented 4 years ago

@caleb-lindgren I really thank you for your heart-full reply. Thanks to your reply, 'import package' works well. But I use internet but with-line not WIFI status. Still I got insuffient internet error. Can I ask another solution ? as well as I tried to search other solution

caleb-lindgren commented 4 years ago

I'm glad the package import works now!

As far as the "insufficient internet" error, it just means that there was some error when the package tried to download the data files. I'm pretty sure the error is from some problem with your internet connection, not due to any problem with the package, because I just tested the download utilities and they worked fine. The data files can be fairly large (usually 10-20 MB), so maybe that's causing problems for you? I would suggest looking into some way to test the download capacity of your internet connection. Maybe you need more bandwidth.

caleb-lindgren commented 4 years ago

You could try going somewhere with a better internet connection to download the data files for the first time. Then after that, you wouldn't need to download them anymore, so you'd be able to use the data with minimal internet.

a00101 commented 4 years ago

First of all, thank you for your answer.

The problem was in the institute's firewall. Currently I solved the problem by downloading data from home and uploading it to institute's linux server.

The cancer type I am interested in is luad. However, luad seems to be in embargo, and the following code error occurs.

there is no error in other carcinomas(endometrial and colon), so it seems to be a specific error in the luad dataset. Can you confirm it?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import cptac
cptac.download(dataset="luad", version="latest")
en = cptac.Luad()

A1BG_cross = en.join_omics_to_omics(df1_name="proteomics", df2_name="transcriptomics", genes1="A1BG",genes2="A1BG")
A1BG_cross.head()

sns.set(style="darkgrid")
plot = sns.regplot(x=A1BG_cross.columns[0], y=A1BG_cross.columns[1], 
                   data=A1BG_cross)
plot.set(xlabel='Proteomics', ylabel='Transcriptomics', 
         title='Proteomics vs. Transcriptomics for the A1BG gene')
plt.show()

TypeError                                 Traceback (most recent call last)
<ipython-input-29-46ba17850eed> in <module>
      1 sns.set(style="darkgrid")
      2 plot = sns.regplot(x=A1BG_cross.columns[0], y=A1BG_cross.columns[1], 
----> 3                    data=A1BG_cross)
      4 plot.set(xlabel='Proteomics', ylabel='Transcriptomics', 
      5          title='Proteomics vs. Transcriptomics for the A1BG gene')

~/anaconda3/lib/python3.7/site-packages/seaborn/regression.py in regplot(x, y, data, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, seed, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, dropna, x_jitter, y_jitter, label, color, marker, scatter_kws, line_kws, ax)
    808                                  order, logistic, lowess, robust, logx,
    809                                  x_partial, y_partial, truncate, dropna,
--> 810                                  x_jitter, y_jitter, color, label)
    811 
    812     if ax is None:

~/anaconda3/lib/python3.7/site-packages/seaborn/regression.py in __init__(self, x, y, data, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, seed, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, dropna, x_jitter, y_jitter, color, label)
    129 
    130         # Save the range of the x variable for the grid later
--> 131         self.x_range = self.x.min(), self.x.max()
    132 
    133     @property

~/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py in _amin(a, axis, out, keepdims, initial, where)
     32 def _amin(a, axis=None, out=None, keepdims=False,
     33           initial=_NoValue, where=True):
---> 34     return umr_minimum(a, axis, None, out, keepdims, initial, where)
     35 
     36 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,

TypeError: cannot perform reduce with flexible type
caleb-lindgren commented 4 years ago

The reason you're running into a problem is because the LUAD proteomics dataframe has a multi-level column index. This is where each column is identified by multiple headers instead of just one. Our tutorial number 4 in the docs folder, which you can view by clicking here, gives an in-depth explanation of what multiindexes are and how to work with them.

To solve your problem in this particular case, you just need to use our reduce_multiindex function to get rid of the second column level. This is also explained in tutorial 4. In your case, the command would be A1BG_cross = en.reduce_multiindex(A1BG_cross, levels_to_drop="Database_ID"). Just execute that before you create the plot, and it should work fine.

caleb-lindgren commented 4 years ago

Sorry, your command would be A1BG_cross = en.reduce_multiindex(A1BG_cross, levels_to_drop="Database_ID"), using en instead of lu. I updated the original comment to use en.

a00101 commented 4 years ago

@caleb-lindgren Really thank you. It works all well.

Can i ask something if you could? How to customize my data including transcriptome, proteome, phosphoproteome for using cptac library? Thanks.

caleb-lindgren commented 4 years ago

I'm glad it works.

Just to clarify, are you saying that you want to use the package to access your own data that you have generated, or are you asking about customizing the data that's already in the cptac package? Could you give me an example of what you mean when you say you want to "customize" the data?

a00101 commented 4 years ago

Sorry for my vagueness. I meant that I want to use the 'cptac' to access my own data I already generated.

samuelpayne commented 4 years ago

does your data fit the proteogenomic pattern?

On Tue, May 26, 2020 at 10:57 PM a00101 notifications@github.com wrote:

Sorry for my vagueness. I meant that I want to use the 'cptac' to access my own data I already generated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PayneLab/cptac/issues/7#issuecomment-634428195, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS55IND3366TYUQHIG4YK3RTSMR7ANCNFSM4NFQDW4A .

a00101 commented 4 years ago

if you use 'fit' as meaning of various dataset, alright. I have four datasets (mRNA, Proteome, Phosphoproteome, Methylation)

samuelpayne commented 4 years ago

we've thought about how to do this in abstract, but have not done it. So the instructions below will take a bit of work implementing. first, fork the repo. Then add a new class for your tumor type that inherits from the "Dataset" class. You can see how this is done by looking at endometrial or other classes. You have to write the loading methods and anything else you want to overwrite from the base class. Then you have to install your new version using

pip install .

It may be a bit of work to figure out how to get it to load data from your hard drive and not our Box URLs as is done for other tumors.

caleb-lindgren commented 4 years ago

Also, this will probably be very helpful to you: We have recently published all of our developer documentation as part of the repository. The files are located here. They walk through all the individual steps of creating new datasets and making other changes and updates to the package.

After you fork the repository, I would recommend first reading through 00_why_we_did_what_we_done.md, as it will provide a good overview of the package's structure.. Then, you'd basically just need to set up your remote storage location, and then add your data by following the instructions in 02_add_new_dataset.md.