Closed martin-raden closed 6 years ago
Hey, I think the best option would be to add a commandline switch that first checks if jalview is present and if yes creates the corresponding output files and just prints a warning if not. I would recommend not to add the jalview recipe to the default coprarna dependencies.
ok, the script already checks whether jalview
is available and aborts with error message otherwise.
@eggzilla do you think the script should be part of the github distribution?
I would add the script to coprarna_aux, but maybe do the check with the other sanity checks in the main script. Then they are bundled in one place :-)
mhh.. I disagree since I see this script as an optional postprocessing script for a coprarna job. so it should IMO not be run by the coprarna pipeline itself...
I am fine with both solutions, however jens false-positive removal post-processing script is now also triggerable by commandline switch (--ooifilt). Or do you mean to add the switch and leave the check in the script, so it works also on its own.
second script for top-X-cleanup is ready:
#!/usr/bin/env bash
#####################################
#
# DEPENDENCY: CopraRNA called with '-websrv', which should result in
# - subdir 'evo_alignments'
# - file 'CopraRNA_result.map_evo_align'
#
# 1) removes all folders from 'evo_alignments' that are not named in 'CopraRNA_result.map_evo_align'
#
# 2) renames all folders in 'evo_alignments' (and their content) to 'rank_X' where X is the line number in 'CopraRNA_result.map_evo_align'
#
#####################################
IDFILE=CopraRNA_result.map_evo_align;
EVODIR=evo_alignments
# check for required data
test -e $IDFILE || { echo >&2 "ERROR: expected file '$IDFILE' not found.."; exit 1; }
test -d $EVODIR || { echo >&2 "ERROR: expected subdir '$EVODIR' not found.."; exit 1; }
# generate list of folders IDs to maintain
RANKEDIDS=`cat $IDFILE | grep -P '^\d+$' | tr '\n' ' ' `
# generate search string for ranked ids enclosed by '_'
RANKEDIDPATTERN=`echo $RANKEDIDS | tr ' ' '_' `;
RANKEDIDPATTERN="_${RANKEDIDPATTERN}_";
# remove unnecessary subfolders
for d in $EVODIR/*; do
# get ID from subfolder name
CURSUBDIR=`echo $d | tr "/" " " | awk '{print $2;}'`;
CURID=`echo $d | tr "/_" " " | awk '{print $3;}'`;
# check if in RANKEDIDS
if [[ $RANKEDIDPATTERN == *"_${CURID}_"* ]]; then
# get rank of file
CURRANK=`grep -P "^${CURID}$" -n $IDFILE | awk -F ':' '{print $1}'`;
# rename files in folder to 'rank_$CURRANK_*'
for file in `ls $d/*`; do
fileNEW="${file/${CURSUBDIR}_/rank_${CURRANK}_}";
mv ${file} ${fileNEW};
done
# rename folder to 'rank_$CURRANK'
mv $d rank_$CURRANK;
else
# remove subfolder
rm -rf $d;
fi
done
@PatrickRWright @JensGeorg I discussed with @eggzilla the following pipeline:
the standard CopraRNA2.pl call should to avoid unnecessary file number explosion on the user's harddrive. thus is should at the end:
evo_alignments
folder to evo_alignments.zip
e.g. using
zip -rmq evo_alignments evo_alignments 2>&1
(quitely packs and removes afterwards)since 99% of the users will never touch the alignments
the webserver will
evo_alignments.zip
both scripts will be part of the webserver postprocessing pipeline and dont have to be integrated into the coprarna package. only the jalview call will be added to the documentation for sake of completeness.
what do you think?
The data managment sounds good for me. I think the jalview part is only needed for the webserver and does not need to be part of the CopraRNA package for now.
For the midterm perspective: I am currently thinking about evolutionary stuff (not touching the original CopraRNA prediction) as a post-processing, which might need additional ressources.
ok, I have added the zipping to CorpaRNA cleanup in #9
I will work on the scripts for the webserver.
@JensGeorg let me know what you have in mind when settled. ;)
Hi @PatrickRWright @eggzilla @JensGeorg ,
we have Jalview now ready via bioconda and I have created a small shell script (see below) that produces for all subdirs of
evo_alignments
(=default or uses provided dir) accordingsvg|eps|png
figures (defaultsvg
only).So a few questions come up:
jalview
?For the webserver I will:
evo_alignments
folder to the entries from the finalwebsrv
result tablerank_x
/rank_x_...
Anything forgotten?
Please give me your feedback, thanks!