probleme beim laufen lassen...

martin-raden commented 7 months ago

cherri via conda installiert
v4 zenodo file runtergeladen
modelldaten extrahiert via tar -xzf Cherri_models_v4.tar.gz --wildcards ./Cherri_models_v4/Model_with_graph_features/human_rbp/*
modelpfad gesetzt MODELPATH=./Cherri_models_v4/Model_with_graph_features/human_rbp
und cherri aufgerufen mit

cherri eval \
-i1 cherri_input.bed \
-g human -l human \
-o ./cherri-out/ -n human_rbp -c 150 -st on \
-m $MODELPATH/full_human_rbp_context_150.model \
-mp $MODELPATH/training_data_human_rbp_context_150.npz \
-i2 $MODELPATH/occupied_regions/occupied_regions.obj

fehler 1 : output pfad wird nicht angelegt

Running for you in EVALUATION mode ...

Traceback (most recent call last):
  File "/home/mmann/miniconda/envs/cherri/bin/cherri", line 846, in <module>
    main_eval(args)
  File "/home/mmann/miniconda/envs/cherri/bin/cherri", line 357, in main_eval
    os.mkdir(out_path)
FileNotFoundError: [Errno 2] No such file or directory: './cherri-out//20231222_Cherri_evaluating_RRIs/'

gelöst mit mkdir ./cherri-out//20231222_Cherri_evaluating_RRIs/

aber: sollte im tool passieren, bzw. KEINEN UNTERORDNER anlegen.. dazu gibt man ja den output ordner an. DER sollte aber angelegt werden!

fehler 2 : model parsing failed

Traceback (most recent call last):
  File "/home/mmann/miniconda/envs/cherri/bin/cherri", line 846, in <module>
    main_eval(args)
  File "/home/mmann/miniconda/envs/cherri/bin/cherri", line 395, in main_eval
    df_rris = pd.DataFrame(df_content_list,columns=header)
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 721, in __init__
    arrays, columns, index = nested_data_to_arrays(
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 519, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 883, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 985, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 34 columns passed, passed data had 33 columns

?! ist das model file kaputt?

teresa-m commented 6 months ago

Sorry for taking such a long time to reply. related to Fehler 2

Could you please share your cherri_input.bed file with me? Maybe something in setting up the input is not well documented/not correct. Your call using the test_file is working for me. I hope I can reproduce your error using you input data.
One issue could also be that you downloaded Cherri Zenodo version v4 instead of v5? There was an issue with the v4 and therefore I had to uploade a v5 but did not updated the folder name. This will be fixed. :-)

related to Fehler 1 I can fix it in the way that if you give a folder this one will be used 'e.g. cherri.out instead of using this folder as the output dir and than adding the program generated folder there.

martin-raden commented 6 months ago

das bed file sieht so aus:

start1,stop1,strand1,chrom2,start2,stop2,strand2
8394464,8394515,+,2,231456460,231456515,-
8394358,8394374,+,22,39319070,39319094,-
8256781,8256802,+,16,71758426,71758448,-
8256849,8256861,+,5,181241871,181241883,-
8392683,8392700,+,11,62855033,62855048,-
8393135,8393144,+,15,66502859,66502868,-
8442652,8442665,+,2,206161904,206161917,+
8256851,8256857,+,8,98042112,98042118,-
...

martin-raden commented 6 months ago

mir scheint, da fehlt der chrom1 column.. tata...

teresa-m commented 6 months ago

Ja, aber es ist nicht gut, dass es keinen test hierfüre gibt. Ich werde schauen dass ich den noch einbaue, wenn ich eh am output folder arbeiten. Sorry for that!

martin-raden commented 6 months ago

kein ding... mit dem aktualisierten file läufts an ... mal sehen was rauskommt. 😀🤞

teresa-m commented 6 months ago

Super, vielen Dank! :crossed_fingers:

martin-raden commented 6 months ago

mhh... den fehler schon mal gesehen?

$ cherri eval -i1 cherri_input.bed -g human -l human -o ./cherri-out/ -n human_rbp -c 150 -st on -m $MODELPATH/full_human_rbp_context_150.model -mp $MODELPATH/training_data_human_rbp_context_150.npz -i2 $MODELPATH/occupied_regions.obj

Running for you in EVALUATION mode ...

***Added new folder***

Info: including given occupied regions object

100% [..............................................................................] 11672 / 11672
1. Prepare RRI instances

Traceback (most recent call last):
  File "/home/mmann/miniconda/envs/cherri/bin/cherri", line 846, in <module>
    main_eval(args)
  File "/home/mmann/miniconda/envs/cherri/bin/cherri", line 428, in main_eval
    rl.call_script(call_pos_neg)
  File "/home/mmann/miniconda/envs/cherri/lib/python3.8/site-packages/rrieval/lib.py", line 461, in call_script
    assert not error, "script is complaining:\n%s\n%s" %(call, error)
AssertionError: script is complaining:
generate_pos_neg_with_context.py  -i1 ./cherri-out//20240115_Cherri_evaluating_RRIs/evaluete_RRIs.csv -i2 ./cherri/Cherri_models_v4/Model_with_graph_features/human_rbp/occupied_regions.obj  -d ./cherri-out//20240115_Cherri_evaluating_RRIs/positive_instance/ -g ./cherri-out//20240115_Cherri_evaluating_RRIs//genome/hg38.fa -n human_rbp -c 150 --pos_occ -so 5 -l ./cherri-out//20240115_Cherri_evaluating_RRIs//genome/hg38.chrom.sizes -p /home/mmann/miniconda/envs/cherri/lib/python3.8/site-packages/rrieval/IntaRNA_param/IntaRNA_param.txt -m eval
Traceback (most recent call last):
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 982, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 1030, in _validate_or_indexify_columns
    raise AssertionError(
AssertionError: 1 columns passed, passed data had 17 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mmann/miniconda/envs/cherri/bin/generate_pos_neg_with_context.py", line 1051, in <module>
    main()
  File "/home/mmann/miniconda/envs/cherri/bin/generate_pos_neg_with_context.py", line 987, in main
    df_pos_data,lost_inst_pos_new, no_less_sub_opt_pos = decode_IntaRNA_call(call_pos,
  File "/home/mmann/miniconda/envs/cherri/bin/generate_pos_neg_with_context.py", line 482, in decode_IntaRNA_call
    df = decode_Intarna_output(out)
  File "/home/mmann/miniconda/envs/cherri/bin/generate_pos_neg_with_context.py", line 243, in decode_Intarna_output
    df = pd.DataFrame([line], columns=col)
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 721, in __init__
    arrays, columns, index = nested_data_to_arrays(
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 519, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 883, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/home/mmann/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 985, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 1 columns passed, passed data had 17 columns

teresa-m commented 6 months ago

Ne glaube nocht nicht, aber ich schau es mir nach dem mittag gleich mal an.

martin-raden commented 6 months ago

hier das bed file: cherri_input.zip

martin-raden commented 6 months ago

und hier der call

MODELPATH=./cherri/Cherri_models_v4/Model_with_graph_features/human_rbp

cherri eval -i1 cherri_input.bed -g human -l human -o ./cherri-out/ -n human_rbp -c 150 -st on -m $MODELPATH/full_human_rbp_context_150.model -mp $MODELPATH/training_data_human_rbp_context_150.npz -i2 $MODELPATH/occupied_regions.obj

teresa-m commented 6 months ago

Für ein paar positions waren start und ende im input file verdreht. Also start position war größer als endpositon. Ich habe das geändert und jetzt läuft das script bei mir ohne erros durch. Kannst du es noch mal testen? cherri_input.csv

Der Fehler kommt dann tatsächlich vom IntaRNA call, weil die postionen vertauscht sind:

"# ERROR : seedTRange : '151-130' can not be parsed for sequence index range [1,280] of sequence target\nid1;start1;end1;id2;start2;end2;subseqDP;hybridDP;E;seedStart1;seedEnd1;seedStart2;seedEnd2;seedE;E_hybrid;ED1;ED2\n"

Wird aber leider nicht vom strerr subproszess erkannt. Habe jetzt eingebaut falls ein ERROR im intaRNA ouput ist wird der error und call reportet. Hoffe so findent man den fehler nächstes mal schneller.

teresa-m commented 6 months ago

Short update zu fehler 1 : output pfad wird nicht angelegt Es ist auch jetzt schon möglich einen output folder anzugeben. Es gibt zwei parameter einen um den Pfad zu setzten und einen für den 'output _name'. Wenn du den call ausfürst sollte der output folder cerri-out heißen und im current dir anglegt sein. Ich glaube ich hatte es damals gesplitten damit ich es besser lesen kann, wenn ich alle models hintereinander laufen lasse.

cherri eval \
-i1 cherri_input.bed \
-g human -l human \
-o ./ -on cherri-out/ -n human_rbp -c 150 -st on \
-m $MODELPATH/full_human_rbp_context_150.model \
-mp $MODELPATH/training_data_human_rbp_context_150.npz \
-i2 $MODELPATH/occupied_regions/occupied_regions.obj

martin-raden commented 6 months ago

Für ein paar positions waren start und ende im input file verdreht. Also start position war größer als endpositon. Ich habe das geändert und jetzt läuft das script bei mir ohne erros durch. Kannst du es noch mal testen?

Danke dir. Hab mein file gefixt bzw. deins verwendet. Damit läufts schon ein Stück weiter, aber crashed dann im im Schritt 2b:

cherri eval -i1 cherri_input.bed -g human -l human -o . -n human_rbp -c 150 -st on -m $MODELPATH/full_human_rbp_context_150.model -mp $MODELPATH/training_data_human_rbp_context_150.npz -i2 $MODELPATH/occupied_regions.obj

Running for you in EVALUATION mode ...

***Added new folder***

Info: including given occupied regions object

100% [..............................................................................] 11672 / 11672
1. Prepare RRI instances

2. Compute features

2a. Compute graph-kernel features

2b. Classify RRI instances

Traceback (most recent call last):
  File "/home/mmann/miniconda/envs/cherri/bin/cherri", line 846, in <module>
    main_eval(args)
  File "/home/mmann/miniconda/envs/cherri/bin/cherri", line 493, in main_eval
    df_eval = rl.classify(X_filterd, model_file, eval_file, df_ID, False, 'off', True)
  File "/home/mmann/miniconda/envs/cherri/lib/python3.8/site-packages/rrieval/lib.py", line 1343, in classify
    model = loadfile(in_model_filepath)['estimator']
  File "/home/mmann/miniconda/envs/cherri/lib/python3.8/site-packages/ubergauss/tools/__init__.py", line 54, in <lambda>    loadfile = lambda filename: dill.load(open(filename, "rb"))
  File "/home/mmann/miniconda/envs/cherri/lib/python3.8/site-packages/dill/_dill.py", line 287, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/home/mmann/miniconda/envs/cherri/lib/python3.8/site-packages/dill/_dill.py", line 442, in load
    obj = StockUnpickler.load(self)
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given

Hast du ne Idee?

teresa-m commented 6 months ago

Für mich sieht es so aus als ob er das modelfile nicht lesen kann. Ich versuch deinen Call auch noch mal bei mir mit dem downloade von V5.

Ahh und sorry hatte ich vergessen zu sagen. Du kannst die occupied_regions weg lassen. Da hattest du mich drauf aufmeksam gemacht, dass das keine gute Idee ist die im eval zu verwenden. Aber hatte es in der doku anscheinend noch nicht geändert. Bzw. Habe die doku gestern angepasst und push hoffendlich heute zum main.

martin-raden commented 6 months ago

vielleicht ist ja auch im call noch was falsch... ?!

teresa-m commented 6 months ago

bei mir läuft es durch.

cherri eval -i1 ./cherri-out/cherri_input_short.bed -g human -l human -o ./cherri-out/ -n human_rbp -c 150 -st on -m ./test_eval/Cherri_models_v4/Model_without_graph_features/human_rbp//full_human_rbp_context_150.model -mp ./test_eval/Cherri_models_v4/Model_without_graph_features/human_rbp//training_data_human_rbp_context_150.np

Habe nur deine pfade angepasst aber sonst deinen call benutzt. Du nimmst das with_graphfeature model? Vielleicht ist beim entspacken was schief gelaufen?

RRI_site_evaluation$ md5sum ./test_eval/Cherri_models_v4/Model_with_graph_features/human_rbp//full_human_rbp_context_150.model
4616a51deea9229855456b24f3e549e8  ./test_eval/Cherri_models_v4/Model_with_graph_features/human_rbp//full_human_rbp_context_150.model

martin-raden commented 6 months ago

md5 stimmt. hier noch die training daten zum abgleich

0b97422de08253c1ea412db0a53f769c  Model_with_graph_features/human_rbp/training_data_human_rbp_context_150.npz

teresa-m commented 6 months ago

stimmt auch: 0b97422de08253c1ea412db0a53f769c ./test_eval/Cherri_models_v4/Model_with_graph_features/human_rbp//training_data_human_rbp_context_150.npz

Jetzt bin ich ein bisschen überfragt. Nur noch mal die biofilm version

# packages in environment at /home/teresa/miniconda3/envs/cherri:
#
# Name                    Version                   Build  Channel
biofilm                   0.1.124            pyhd8ed1ab_0    conda-forge

martin-raden commented 6 months ago

ne genau das gleiche...

teresa-m commented 6 months ago

@smautner: hast du vielleicht einen Idee?

martin-raden commented 6 months ago

kann (im worst case) auf meinem "ubuntu linux im windows" (WSL) liegen. hatte schon mal irgendein anderes kleines problem damit...

sieht aber eigentlich eher (auf den ersten blick) nach einem package problem aus... bzw nach sowas hier:

https://stackoverflow.com/questions/23944657/typeerror-method-takes-1-positional-argument-but-2-were-given-but-i-only-pa

teresa-m commented 6 months ago

# packages in environment at /home/teresa/miniconda3/envs/cherri:
#
# Name                    Version                   Build  Channel
dill                      0.3.7              pyhd8ed1ab_0    conda-forge

Ich kann dir auch meine komplette conda list schicken, falls das hilft?

martin-raden commented 6 months ago

hier ist schon mal meine cherri.env.martin.txt

teresa-m commented 6 months ago

Ich hab Python 3.9.18

vielleicht liegt ist das der Grund... aber wäre schon komisch. cherri_conda_list_teresa.txt

teresa-m commented 6 months ago

Bei dir ist nur dill drinnen? Ich hatte beim erstellen vom enviorment probleme, weil ich anfanges die channels nicht in der richtigen order hatte also erst conda-forge und dann bioconda. Aber hattest warscheinlich richtig wenn das enviorment erstellt wurde. Vielleicht kannst du den container probieren?

https://quay.io/repository/biocontainers/cherri?tab=tags&tag=latest docker run -i -t quay.io/biocontainers/cherri:0.8--pyh7cba7a3_0 bash

martin-raden commented 6 months ago

ok... habs geschafft..

Wir brauchen PYTHON 3.9 !!! bitte unbedingt das conda package file aktualisieren...

https://github.com/bioconda/bioconda-recipes/blob/master/recipes/cherri/meta.yaml

teresa-m commented 6 months ago

Voll gut. Danke!

martin-raden commented 6 months ago

öööhm... verwirrung.. ich geb 73 rna pairs rein und bekomm nur 31 raus... ?! evaluation_results_human_rbp.csv

teresa-m commented 6 months ago

Es kann sein das für die anderen keine interaction gefunden wurde und dann können wir die feature auch nicht bestimmen. Ist aber komisch dass es so viele sind. Ich kann mir die IntaRNA calls ausprinten lassen wenn du meinst das hilft?

martin-raden commented 6 months ago

dann sollte es aber trotzdem für die fälle ein "NA" oder so ausgeben... damit die zeilenanzahl stimmt.. weil im eval mode hat man ja konkrete fragen... würde die den predicted Wert auf NA setzen und das prediction label auf "0", weil wir ja dann davon ausgehen, dass dort keine interaction stattfinden kann (ohne intarna predictions)

martin-raden commented 6 months ago

Es kann sein das für die anderen keine interaction gefunden wurde und dann können wir die feature auch nicht bestimmen. Ist aber komisch dass es so viele sind. Ich kann mir die IntaRNA calls ausprinten lassen wenn du meinst das hilft?

was für seed constraints haben wir denn da drauf? vermutlich seedbp=7 oder?

das könnte hier bei miRNAs etc.. zum problem werden...

teresa-m commented 6 months ago

5 -> https://github.com/BackofenLab/Cherri/blob/master/rrieval/IntaRNA_param/IntaRNA_param.txt Sollte für miRNA ok sein oder?

martin-raden commented 6 months ago

im moment haben wir folgendes:

72 intarna/risearch2/riblast predictions, für die wir eine "stimmt JA/NEIN" annotation haben
für 32 davon liefert cherri eine prediction
- 12 stimmen dabei mit der literature-based JA/NEIN annotation überein
- 20 nicht...

😒 noch nicht so der brüller...

teresa-m commented 6 months ago

dann sollte es aber trotzdem für die fälle ein "NA" oder so ausgeben... damit die zeilenanzahl stimmt.. weil im eval mode hat man ja konkrete fragen... würde die den predicted Wert auf NA setzen und das prediction label auf "0", weil wir ja dann davon ausgehen, dass dort keine interaction stattfinden kann (ohne intarna predictions)

Ja das kann ich noch einbauen. Also würde dann die noch fehlen eine ID einfürgen und als predicted 0 rein geben und die features mit NA auffüllen?

teresa-m commented 6 months ago

im moment haben wir folgendes:

* 72 intarna/risearch2/riblast predictions, für die wir eine "stimmt JA/NEIN" annotation haben

* für 32 davon liefert cherri eine prediction

  * 12 stimmen dabei mit der literature-based JA/NEIN annotation überein
  * 20 nicht...

😒 noch nicht so der brüller...

Ohh das ist ja mist. Sollen wir uns anschauen, welche richtig und welche falsch vorhergesagt werden?

martin-raden commented 6 months ago

ich schau nochmal in ruhe und überleg noch ein bissl und dann können wir uns ja nochmal zusammensetzen und hirnen

🤞

teresa-m commented 6 months ago

Ja gerne 🤞

teresa-m commented 6 months ago

Kannst du vielleicht noch mal nur human und full model und auch ein mal ohne graph feature versuchen? Glaube beim testen hat das rbp dataset auf den human model besser geklappt. Und die without_graph_feature waren ein bisschen besser im inter model vergleich. Aber werden auch keine großen Sprünge sein.

BackofenLab / Cherri

probleme beim laufen lassen... #63

fehler 1 : output pfad wird nicht angelegt

fehler 2 : model parsing failed