Closed HWaymentSteele closed 1 year ago
Hi, you can use the source code currently to batch aminoBERT and RGN2 predictions. To do this, create a directory with your input fasta files and use the parse_fastas
method followed by the aminobert_predict
method in rgn2/aminobert/prediction.py
, as opposed to aminobert_predict_sequence
which is used in the notebook. The remainder of the workflow is the same.
Thank you!
I am trying to set up on my own. I'm a little confused by this cell running protling.py -- what input from before does it need to read?
#@title Run RGN2
#This step generates the raw RGN2-predicted C-alpha trace.
rgn2_env_init = 'source /opt/conda/etc/profile.d/conda.sh && conda init && conda activate rgn2'
try:
with io.capture_output() as captured:
cmd = (f"python rgn/protling.py {os.path.join(RUN_DIR, 'configuration')} "
f"-p -e 'weighted_testing' -a -g 0")
%shell {rgn2_env_init} && {cmd}
except subprocess.CalledProcessError:
print(captured)
raise
print('Prediction completed!')
Sorry I must have missed this -- I assume you resolved this but the input needed is the path to the configuration file within the run directory. The aminobert step in the notebook will create the input TFRecord dataset for your sequences and put it in a data
directory for RGN2, so no need to specify it in the run command.
Hi, are there any plans to make RGN2 available to run sequences in batch mode, either in a notebook or as downloadable source code? would be helpful for benchmarking. thank you!