Closed rmck1 closed 1 month ago
@rmck1 can you copy in the contents of the file at this location
~/.datatrail/config.yaml
i.e. in your home directory somewhere
Here are the contents of the file: ~/.datatrail/config.yaml
root_mounts:
canfar: /arc/projects/chime_frb/
chime: /
gbo: /
hco: /
kko: /
local: ./
server: https://frb.chimenet.ca/datatrail
site: canfar
vospace_certfile: /arc/home/Mckinven/.ssl/cadcproxy.pem
Yeah so the issue is that the post request to find the relevant filepaths returns filepaths with a mix of
data/chime/baseband/raw/...
and
cadc:CHIMEFRB/data/chime/baseband/raw/..
Now normally code exists to account for this by simply finding and replacing all those prefix's.
However in this case, for some reason I don't understand the result gives something like this ...
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_725.h5',
'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_811.h5',
'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_771.h5',
'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_773.h5',
'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_784.h5',
'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_776.h5',
'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_775.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_7.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_757.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_926.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_939.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_948.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_949.h5',
...
which upon close inspection shows a different prefix cadc:CHIMEFRB//data/baseband/raw...
, i.e. it has a double forward slash, and so in the find and replace it catches only part of the prefix, leaving the following mix of filepaths remaining
...
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_302.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_309.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_311.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_312.h5',
'/data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_319.h5',
'/data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_283.h5',
'/data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_316.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_296.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_297.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_303.h5',
'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_314.h5',
...
with forward slashes at the start of some, and as a result there is no common filepath between this set, and the result of that search is common_path=''
, which then when you try to index common_path[-1]
results in the string error. To fix this we can simply search and replace cadc:CHIMEFRB//
first before subsequently finding and replacing the original cadc:CHIMEFRB/
, but I'm relatively confounded as to why this happens in the first place, and what about these certain events that is different.
I think that in this case, it was due to a change in how we wanted to store the file path at Minoc in the Datatrail Database. With the old way being to include the cadc:CHIMEFRB/
at the begining, but the new way is to have the file path start with data/...
and add the storage root.
The fix here is to edit the database with a script to update the files starting with cadc:CHIMEFRB
to the new format. Ie. to remove cadc:CHIMEFRB
from the file name.
No that's not the issue, the existing scripts already find and replace the cadc:CHIMEFRB
to nothing, i.e. removing that prefix from the filename and so those differences are handled, the issue is that some are cadc:CHIMEFRB/data...' and some are
cadc:CHIMEFRB//data` and so there is a discrepancy even once the prefix has been removed based on whether there are 1 or 2 forward slashes. So I guess the solution is that this edge case also needs to be checked?
Right, I understand but ultimately it comes from the fact that the data in the database isn't consistent. If you don't want to touch the data in the database, you can reorganise the logic that you mention, I think it's these lines. With something like the following:
if dataset_locations["file_replica_locations"].get("minoc"): # type: ignore
file_uris = dataset_locations["file_replica_locations"]["minoc"] # type: ignore
- file_paths = [f.replace("cadc:CHIMEFRB/", "") for f in file_uris]
+ file_paths = [f.replace("//", "/").replace("cadc:CHIMEFRB/", "") for f in file_uris]
- common_path = os.path.commonprefix(file_paths).replace("//", "/")
+ common_path = os.path.commonprefix(file_paths)
Either way should address this issue.
Right that makes sense. I guess if this is going to keep happening then logic is better than triaging the database. If it's due to a change in the past then I guess I expect it to be a once off and therefore it's just simpler to change the database, but then I don't know why it would be different for files within a single event?
@tjzegmott did the changes you mention already get implemented? Seems like the new code is already in there, if so we can close this issue
I didn't create any commits and I don't see the changes in the code. At least not on the main branch. I will create the commit and PR now.
Describe the bug
datatrail clear
command regularly fails to clear staged raw baseband data on /arc. The issue was noticed when an inordinate number of failed instances of the baseband pipelines were reporting failures at finalunstage_data
step of the pipeline. (see e.g. https://frb.chimenet.ca/workflow/web/CHIMEFRB/pipelines/mckinven-baseband-processing/66c78aabc05879f572d2ee8d)To Reproduce Issue can be reproduced by running the following command in jupyter notebook session on CANFAR (using image: baseband-analysis:lastest)
os.system("datatrail clear chime.event.baseband.raw 214414053 -vvv -f")
The traceback for the above command is the following:
Expected behavior Raw baseband data temporarily staged at
/arc/projects/chime_frb/data/baseband/raw/YYYY/MM/DD/astro_<event_id>/
should be cleared once singlebeam.h5 file is produced and yet for event 214414053 and other like it staged data remains.