Closed janxkoci closed 1 year ago
In order to fix this, I need to understand the directory structure that you're using.
In my own work, each data set has its own directory, which contains the main data file (data.opf), a "boot" subdirectory for all the bootstrap replicates, and a subdirectory for each model that I fit to the data. The directory for each model contains (among other things), the .legofit files, the .bepe file, and the .flat file. To create the .bepe and .flat files, I "cd" into the directory for the relevant model and run "bepe" and "flatfile.py". Because all the .legofit files are local to that directory, the .bepe and .flat files do not end up with pathnames containing "/" characters.
In order to make this work for you, I need to understand your work flow and how your files are organized into directories and subdirectories.
If I understand correctly, you must have run "bepe" and "flatfile.py" from different directories. Is that necessary?
you must have run "bepe" and "flatfile.py" from different directories
No actually, that is the problem - I run them from the same parent directory, but the Python tools include relative paths to input files, while C tools only include base names of the input files. This is really the only problem.
Currently, I have one parent folder with lgo model files and scripts, and subfolders for different datasets. Within each dataset folder there are subfolders for input data (e.g. hcom3_atgc/data/data.opf
and hcom3_atgc/data/boot*.opf
) and model outputs (hcom3_atgc/1A/*.state
and hcom3_atgc/1A/*.legofit
).
Then I have a bash script that takes the relative path as argument and collects all info I am interested in:
#!/bin/bash
## USAGE
# bash collect_model.sh datafilter/model stage
## READ ARGS
modeldir=$1 # hcom3_tv/1Bc/
stage=${2:-1} # default=1
data=$(echo $modeldir | tr "/" "\t" | cut -f 1) # hcom3_tv
model=$(echo $modeldir | tr "/" "\t" | cut -f 2) # 1Bc
datadir=$data/data # hcom3_tv/data
input=${modeldir}/${model}_${stage}
output=${data}/${model}_${stage}
bepe \
${datadir}/data.opf \
${datadir}/boot*.opf \
-L ${input}_data.legofit \
${input}_boot*.legofit > ${output}.bepe
resid \
${datadir}/data.opf \
${datadir}/boot*.opf \
-L ${input}_data.legofit \
${input}_boot*.legofit > ${output}.resid
flatfile.py \
${input}_data.legofit \
${input}_boot*.legofit > ${output}.flatfile
bootci.py \
${output}.flatfile > ${output}.bootci
The files for bepe, flat, bootci etc are then in the dataset subfolders and named based on the model, e.g. hcom3_atgc/1A_1.bepe
.
To create the .bepe and .flat files, I "cd" into the directory for the relevant model and run "bepe" and "flatfile.py". Because all the .legofit files are local to that directory, the .bepe and .flat files do not end up with pathnames containing "/" characters.
I can try to include the cd
step before I run flatfile.py
(and cd ../
back afterwards). But this may not be intuitive to other users, so fixing the inconsistency may be a better approach. Note that bepe
and other C tools don't need this tweak, because they omit paths to input files in their output.
I can also show the folder structure with Miller columns:
Here, I have a parent folder with lgo files and scripts to submit job arrays and collect results, and subfolders for different datasets (left column), each containing data/{data,boot{0..49}}.opf
and results summarized with bepe, resid, flatfile.py and so on (middle column), and finally model outputs themselves in subfolders for each model (right column).
Sorry to have been so slow on this.
I've now changed flatfile.py so that it prints basenames rather than pathnames. The new code is in the devlp branch. If it works for you, I'll merge to master.
Thanks so much - booma now accepts bepe and flatfiles created from the same parent directory without problems!
Hi Alan,
I keep running into one issue regarding an inconsistency between C tools like
bepe
and Python tools likeflatfile.py
. It comes up a lot when I try to runbooma
, but may appear elsewhere too (I guess, I will let you know if I find another example).Basically, I keep model outputs in separate subfolders and collect results and summaries from the parent folder (as I usually collect these for multiple models at a time, so doing it from parent folder is more convenient).
The problem arises because tools like
bebe
include plain filenames within their output, while the Python tools (likeflatfile.py
) also include paths of the input files. This leads tobooma
errors like this one:I think it's easier to fix this in the Python tools, so I'm guessing this would be the code to look at first: https://github.com/alanrogers/legofit/blob/8cbbaaa7c744a42a2434203dec82cc94e5f0e0f5/src/flatfile.py#L122-L125
Thanks for considering this tweak.