Murali-group / Beeline

BEELINE: evaluation of algorithms for gene regulatory network inference
GNU General Public License v3.0
171 stars 53 forks source link

Problems running GRNBoost and GENIE3 on the example data #37

Closed sotolm closed 3 years ago

sotolm commented 4 years ago

Hello,

I tried running all the methods supported in BEELINE on your example GSD data following the steps you provide in the documentation. I am getting this error for GRNBoost2 and GENIE3:

docker run --rm -v /Users/sotolm/Documents/PR-scGRN/Beeline:/data pidc:base /bin/sh -c "time -v -o data/outputs/example/GSD/PIDC/time.txt julia runPIDC.jl data/inputs/example/GSD/PIDC/ExpressionData.csv data/outputs/example/GSD/PIDC/outFile.txt " 4.710384 seconds (11.83 M allocations: 3.117 GiB, 7.00% gc time) 3.156385 seconds (10.83 M allocations: 531.753 MiB, 7.40% gc time) docker run --rm -v /Users/sotolm/Documents/Beeline:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/GSD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt " Traceback (most recent call last): File "runArboreto.py", line 43, in <module> main(sys.argv) File "runArboreto.py", line 32, in main network = genie3(inDF, client_or_address = client) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3 limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 115, in diy expression_matrix, gene_names, tf_names = _prepare_input(expression_data, gene_names, tf_names) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 214, in _prepare_input expression_matrix = expression_data.as_matrix() File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'as_matrix'

docker run --rm -v /Users/sotolm/Documents/PR-scGRN/Beeline:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GRNBOOST2/time.txt python runArboreto.py --algo=GRNBoost2 --inFile=data/inputs/example/GSD/GRNBOOST2/ExpressionData.csv --outFile=data/outputs/example/GSD/GRNBOOST2/outFile.txt " Traceback (most recent call last): File "runArboreto.py", line 43, in <module> main(sys.argv) File "runArboreto.py", line 36, in main network = grnboost2(inDF, client_or_address = client) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 41, in grnboost2 early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 115, in diy expression_matrix, gene_names, tf_names = _prepare_input(expression_data, gene_names, tf_names) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 214, in _prepare_input expression_matrix = expression_data.as_matrix() File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'as_matrix'

Additionally, I get this error for GENIE3:

Traceback (most recent call last): File "BLRunner.py", line 77, in <module> main() File "BLRunner.py", line 71, in main evaluation.runners[idx].parseOutput() File "/Users/sotolm/Documents/Beeline/BLRun/runner.py", line 90, in parseOutput OutputParser[self.name](self) File "/Users/sotolm/Documents/Beeline/BLRun/genie3Runner.py", line 60, in parseOutput OutDF = pd.read_csv(outDir+'outFile.txt', sep = '\t', header = 0) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f return _read(filepath_or_buffer, kwds) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read parser = TextFileReader(fp_or_buf, **kwds) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__ self._make_engine(self.engine) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__ File "pandas/_libs/parsers.pyx", line 673, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File outputs/example/GSD/GENIE3/outFile.txt does not exist: 'outputs/example/GSD/GENIE3/outFile.txt'

I believe the first two are issues with the version of pandas, the attribute as.martix was deprecated since version 0.23.0 and the one listed is 0.23.4. I tried changing the version of pandas in the requirements.txt but it clashes with the versions of other packages.

sergio-vasquez537 commented 4 years ago

I have had the same issue, and again the problem only arises for GENIE3 and GRNBOOST2. I created my own virtual environment, using python 2.7, to be able to support pandas 0.22 (the last version I thought was able to support the function as.matrix). After this I changed the requirements.txt file to avoid any conflicts (downgrading rpy, matplotlib and scikit learn to the last versions that supported python 2.7). Sadly, I still got the same error, any help would be very welcome. Here is the message:

_docker run --rm -v /home/sergiovasquez/Beeline:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/114/114_gene_set/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/114/114_gene_set/GENIE3/ExpressionData.csv --outFile=data/outputs/114/114_gene_set/GENIE3/outFile.txt " Traceback (most recent call last): File "runArboreto.py", line 43, in main(sys.argv) File "runArboreto.py", line 32, in main network = genie3(inDF, client_or_address = client) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3 limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 115, in diy expression_matrix, gene_names, tf_names = _prepare_input(expression_data, gene_names, tf_names) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 214, in _prepare_input expression_matrix = expression_data.as_matrix() File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'as_matrix' docker run --rm -v /home/sergiovasquez/Beeline:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/114/114_gene_set/GRNBOOST2/time.txt python runArboreto.py --algo=GRNBoost2 --inFile=data/inputs/114/114_gene_set/GRNBOOST2/ExpressionData.csv --outFile=data/outputs/114/114_gene_set/GRNBOOST2/outFile.txt " Traceback (most recent call last): File "runArboreto.py", line 43, in main(sys.argv) File "runArboreto.py", line 36, in main network = grnboost2(inDF, client_or_address = client) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 41, in grnboost2 early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 115, in diy expression_matrix, gene_names, tf_names = _prepare_input(expression_data, gene_names, tf_names) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 214, in _prepare_input expression_matrix = expression_data.as_matrix() File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'asmatrix'

adyprat commented 4 years ago

Hi, Sorry for the delay in getting back to you both. I'll work on pushing an update on this soon. Changing your VM doesn't help since GENIE3 and GRNBoost2 are run inside the Arboreto Docker container. You are correct in the reason for the error being that Arboreto is using an older version of pandas, but the change has to be made inside the Docker and not in the BEELINE environment from which you invoke BLRun.py. So, you'll need to downgrade the pandas version in this file here: https://github.com/Murali-group/Beeline/blob/master/Algorithms/ARBORETO/Dockerfile And rebuild your Docker container. Hope it helps! Best, Aditya

sotolm commented 4 years ago

Hi,

Thanks for your response. I tried downgrading the version of pandas in the Dockerfile but it's incompatible with the versions of other libraries. I installed Arboreto outside Beeline and changed its source code from .as_matrix to .values and that worked fine for me.

Best, Larisa