Closed rbonicel closed 10 months ago
I am not sure if I fully understand how your data is organized, do you have a DICOMDIR file in your subject/session folder? Can you give me the path of a DICOM file of two subjects? And to debug, can you run the bidsmapper without the -s
flag (makes it harder for me)?
Hello, I prepared my data according to the https://bidscoin.readthedocs.io/en/stable/preparation.html page for the first test. I organised it adding the "sub-" prefix before the subject ID and renaming the sessions accordingly. My data follows a DICOMDIR layout. Please see below : One subject has dicom and mrs data, the other only has dicom data. The dicom paths are as follows : raw_bids/sub-99181100824104/ses-01/DICOM/IM_001 for example
I used the -s
flag because the runs without it got me an error saying the the used dicom file that were copied temporarily to another location were apparently deleted by the command itself, and for those files, it isn't possible to change the data type or any other naming parameter.
This last screenshot is extracted from the terminal output of the bidsmapper raw_bids/ bids/
command. All the files listed in the bidseditor that need review are deleted from the temp folder.
The path listed in those last screenshots are created by the command itself, as my data follows he DICOMDIR layout.
Hope this helps. I can provide any further information needed. Thanks
Robin
That certainly helps. First of all, you probably don't have to rename your folder using a sub-
prefix, because the prefix can also be passed as an input argument to bidsmapper. Second, I see you have a DICOMDIR file in your session folder, which is going to be used by bidscoin to find your DICOM files. Did you reorganise your DICOM files, i.e. make the DICOMDIR file invalid?
My data is initially organized as follows :
subjectid1_W1 subjectid1_W2 subjectid2_W1
and so on in the origin folder.
I just rearranged the data as previously stated, with a sub-
prefix and the right ses-
prefix according to the timepoint. I then copied the whole dicom folder to another location with the proper sub/ses
prefixes. If I look in the DICOMDIR file, the (0004, 1500) Referenced File ID
should still be valid because the folder tree such as ['DICOM', '00000003', 'IM_1483']
(for example) should be intact. Does bidscoin uses another tag in the DICOMDIR file to find the dicom files, such as 0004, 1400) Offset of the Next Directory Record
?
Because my data is moved around, compressed in tar.gz
for example. But the Referenced File ID
supposedly remains untouched through the process.
Another thing is that the Philips anonymization process on the MRI console deletes the SeriesDescription
tag, so for all dicom files I recreate it by filling it with ProtocolName
tag which still contains the sequence name in my data, or other sequence related info when the ProtocolName
is not present. But this shouldn't corrupt the DICOMDIR file if I'm not mistaken.
I do need the SeriesDescription
tag for the bidscoiner to work, I figured.
I got a bit puzzled because your error is caused by a missing prepended raw_bids
part in the provenance store (i.e. it should have been /bids/code/bidscoin/provenance/raw_bids/sub-etc
. A quick workaround would be to replace the raw_bids
sub-string in your subject/session regexps with provenance
. However, I do like to better understand what is happening here, and solve it properly... Could you try removing the ses-..
folders in raw_bids
, i.e. put the DICOMDIR data directly in the sub-..
folders (same as in the readthedocs page)?
A useful debugging print statement would be to change the lines (around 129) to:
for sesfolder in sesfolders:
if store:
print(f"***DEBUG***: sesfolder={sesfolder}" if unpacked else '')
store = {'source': sesfolder.parent.parent.parent.parent if unpacked else rawfolder.parent,
'target': bidscoinfolder/'provenance'}
I'm all in favor of fixing the issue properly too.
As requested, I removed everything (including the numerous prints in my first comment), and used the following data structure :
I removed the ses-
prefix and copied the data "as is" in both sub-
folders. The only work performed on this data is the creation of the SeriesDescription
tag from the ProtocolName
when available. DICOMDIR file remains untouched, as the files inside the /DICOM
folder. With this file organization, I do not know how I will tackle the session issue, with the W1 and W2. Should the info be written in the (0010, 0010) PatientName
tag ?
I ran bidsmapper raw_bids/ bids/
without any flags, and got the following result :
Those 50 files need renaming according to the bidseditor (and they do, as the acquisition name needs to be uniformized). But those 50 files are also the ones "not found" because seemingly deleted by the script itself after the copy :
I sill don't understand how and why they are deleted. Those selected files seem to cover most of the sequences in the exam, if not all.
With this test, I'm back to step 1, from the first test I did, before this issue was raised.
A useful debugging print statement would be to change the lines (around 129) to:
for sesfolder in sesfolders: if store: print(f"***DEBUG***: sesfolder={sesfolder}" if unpacked else '') store = {'source': sesfolder.parent.parent.parent.parent if unpacked else rawfolder.parent, 'target': bidscoinfolder/'provenance'}
Could you please tell me to which script you are referring in this comment ? In the bids.py
script, around line 129, it's where the properties
is defined.
Sorry, that print statement is for the bidsmapper.py
Good, now the next step. I think DICOM files were not properly named because of the missing (anonymized) SeriesDescription
field. Didn't you get any warning saying: File already exists: ..
?
Good, now the next step. I think DICOM files were not properly named because of the missing (anonymized)
SeriesDescription
field. Didn't you get any warning saying:File already exists: ..
?
Yes absolutely, for all files I think. So a new name was created. Do I need to clean something else before running the bidmapper with clean data ? I checked the /home/robin/.bidscoin/4.2.1/templates
but they seemed untouched.
The waning is like this :
FIY, the start of the command looks like this :
Should I be worried about the Could not validate every run-item in the bidsmap
In your path I see sub-folders names 01-
(right after the subject folder), what are those? That looks suspicious to me...
Should I be worried about the Could not validate every run-item in the bidsmap
No, I should probably delete that print statement. Bidsmaps are checked for correctness when loaded, however, some fields cannot be automatically checked because of some complicated BIDS specifications
Btw, you can also run dicomsort
directly on (a testcopy of) the raw subject folder. That easier / quicker
In your path I see sub-folders names
01-
(right after the subject folder), what are those? That looks suspicious to me...
I do not know. This path is created by the script itself when it notices that the File already exists
and creates a new name. For example : WARNING | File already exists: /tmp/tmpi96jvjbs/home/robin/Documents/Database/PIP/raw_bids/sub-99181100856109/DICOM/00000002/IM_2037 -> /tmp/tmpi96jvjbs/home/robin/Documents/Database/PIP/raw_bids/sub-99181100856109/99181100856109-w1/01-/901-WIP_DTI_PIPDIR_B1000PA_TE75TR6400_ISO2/IM_2037
I understand the first path, as it is what I expect from for my exam folder. But the second path seems created by the script gathering info from the DICOM file itself :
99181100856109-w1
is the content of (0010, 0010) PatientName
tag. 901-WIP_DTI_PIPDIR_B1000PA_TE75TR6400_ISO2
is the concatenation of (0020, 0011) SeriesNumber
tag, a -
and the SeriesDescription
tag. I thought 01-
was the session number added by the bidsmapper command.
Upon looking for the warning in dicomsort.py
, I get the following construction for newfilename
, in the sortsession
function :
So I'm guessing that the construction of the newfilename
in the construct_name
function creates this. Do you know from where this 01-
part could come from ? I didn't see in the dicom header which info could be used for this.
[EDIT] I ran the bidsmapper on a non-anonymized subject, which I requested to the MRI team. So all tags are present, and no modification has been done whatsoever. I removed the patient name on the following screenshot :
So, I still don't know where the 01-
comes from, but the IRM PIP
is stored in (0040, 0254) PerformedProcedureStepDescription
when present. I guess the Philips anon process removes it. So I guess that this part of filepath, although missing some info when the tag above is missing, is still perfectly normal in the renaming process done by the construct_name
function.
I give you below the rest of the prints of the bidsmapper
command on said non ano subject :
I still face the same issue of the files disappearing. All 17 files Discovered 'extra_data' DICOM sample
are not found at the end of the run, and thus can't be renamed.
I'll run dicomsort
alone with some prints and post the result here.
Btw, you can also run
dicomsort
directly on (a testcopy of) the raw subject folder. That easier / quicker
The dicomsort
gave me result, I wasn't using it properly regarding the -i SUBPREFIX
flag. So, running dicomsort '/home/robin/Documents/Database/PIP/raw_bids_testcopy' -i 'sub-'
gave me :
For subject sub-99181100824104
, which is initially composed of :
It yieled :
So there's a lot of missing data. It created 5 folders, though, which is the number of sequences sorted in the folder:
INFO | Searching for subject/session folders in: /home/robin/Documents/Database/PIP/raw_bids_testcopy INFO | Reading: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/DICOMDIR INFO | >> Sorting: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01- (5811 files) INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/407-SAG INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/408-AX INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/502-SVS_PRESS_TE35TR2000_64NSA_231943_426 INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/601-FMRI_TE37TR800_DYN500_6MIN47_PA_SENSE1_6 INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/501-SVS_PRESS_TE35TR2000_64NSA
So I need to figure out why the rest of the data was not processed.
I still got the File already exists
warning for every file. How can I clear everything in order to remove this warning ?
Also, the dicomsort
created a subfolder alongside the origin data in the subject folder. Should I remove the origin data from the folder and rearrange sorted data for the following steps, the bidsmapper
and the rest ?
For subject 99181100856109
I got 67 folders created in the folder, and 67 printed lines of creation in the terminal. Everything seems in order for this subject, so I won't post the screenshots. It created 2 folders for each sequence, one containing the dicom files, and another with the XX and PS files, so that seems correct.
But for the first subject, I don't know where the error lies.
I also ran the dicomsort
on the non ano subject. All went well. I'll run the bidsmapper and see how it goes.
I ran the dicomsort
on the non ano subject, 99181100857100
. It works perfectly without any File already exists
warning and gave the following folder :
I renamed the created folder 99181100857100
to sub-99181100857100
(because I got the error that started this issue, ERROR | Could not parse required sub-<label> label from
, and ran the following command : bidsmapper -n 'sub-' '/home/robin/Documents/Database/PIP/raw_bids_non_ano/99181100857100_W1/' '/home/robin/Documents/Database/PIP/bids'
.
I renamed successfully the files that needed :
I then ran bidscoiner '/home/robin/Documents/Database/PIP/raw_bids_non_ano/99181100857100_W1' '/home/robin/Documents/Database/PIP/bids'
and got the following result :
which is what I expected. So I guess with this non ano exam, it worked well.
For the other subjects, I need to keep the origin data as is, but create the SeriesDescription
tag and run everything.
What I don't get is that the dicomsort.py
has these lines
in the construct_name
function, so I shouldn't need to create the extra tag, because I have the ProtocolName
tag available.
So my final questions are :
ProtocolName
tag not taken as an alternative when SeriesDescription
is not available ?File already exists
warning which gets to code to perform operations that are not suitable for the process to succeed ?Thanks for all the help on the matter.
So I guess with this non ano exam, it worked well.
Good, so it seems that missing DICOM fields have caused all the issues. Dicomsort uses SeriesDescription
(or, if that's not available it will fall back to ProtocolName
), as well as SeriesNumber
(or, if that's not available it will fall back to SeriesInstanceUID
). Is the latter perhaps also removed in the anonymization step?
So my final questions are :
- why is the ProtocolName tag not taken as an alternative when SeriesDescription is not available?
I'm pretty sure it will be used and that you don't need to re-write the SeriesDescription field yourself
- How to get rid of the File already exists warning which gets to code to perform operations that are not suitable for the process to succeed?
In principle, the warning should not happen, but it may be the result of missing SeriesNumber
values. The warning itself is innocent (it's just about giving the file a new name). It's really difficult for me to see what is going from a distance. Is there any chance you can give me copy of one of your anonymized datasets (the two sessions of one subject)?
Perhaps you can tell people not to anonymize the data so aggressively, it's really silly and causing massive problems
Perhaps you can tell people not to anonymize the data so aggressively, it's really silly and causing massive problems
Since I started this issue, I asked the MRI team to stop using Philips' anonymization process altogether. I'll take care of it with a lighter routine. The tests gave better results with non-anon data, so I will proceed like this from now on. I need to save old data, though.
Good, so it seems that missing DICOM fields have caused all the issues. Dicomsort uses SeriesDescription (or, if that's not available it will fall back to ProtocolName), as well as SeriesNumber (or, if that's not available it will fall back to SeriesInstanceUID). Is the latter perhaps also removed in the anonymization step?
The last two should be available is th edicom, even with the Philips anon process. I proceeded to two tests : one running dicomsort
on the modified data with the extra tag and one running it on the origin data, using ProtocolName
as the highest available info in your list. Good news is that I get the same result in both cases. So indeed, the re-writing of the SeriesDescription
can be discarded.
I'm pretty sure it will be used and that you don't need to re-write the SeriesDescription field yourself
Yes indeed
But I still have some trouble on some exam folders, with one of the following two errors :
When I try a single exam that gets nothing sorted, I get the following error :
This was the error that got me thinking in the first place that the SeriesDescription
tag was compulsory. This exam is the very first one of the study, the most complicated because the Philips anon process got rid of every useful tag except the SeriesNumber
tag. In such cases, I provided the -f FOLDERCHEME
flag with the only tag available {PatientName}_{SeriesNumber:03d}
because the function didn't fell back on it on its own. But the sorting was complete sequence-wise, which is good news.
I will try with these -f
or -n
flags for the remaining problematic subjects, before resorting to sending you the data.
So the function doesn't fall back automatically to the lowest tag info available, just so I know when to use those flags ?
Thanks
[EDIT] I fixed mot of the errors with the flags or by giving a deeper path to the DICOMDIR file. It corrected most of the "nothing sorted" issues. I'm left with the last two exams :
one doesn't have a DICOMDIR layout and goes as follows :
I don't know which path to feed the dicomsort
function for it to work
the second one has a DICOMDIR layout :
but it only sorts a few sequences of the whole exam. Opening the exam with MRIConvert
I can see all of it, so I don't know if the DICOMDIR file is affected/corrupted or not. Because my guess was that the DICOMDIR file is not complete, hence the function cannot find the path of all sequences.
If you could help me on those last two exams, if I send you the data, that'd be great.
Most of the problems are solved using proper flags and data paths. I'll write a script to properly sort the special cases for the dicomsort
to work properly on all data
You will also get the error: Missing 'SeriesDescription' ..
if both your SeriesDescription and your ProtocolName fields are empty (I suppose that's what happened).
I'd like to understand what causes your issue with the path to the DICOMDIR file, can you perhaps insert print(f"***DEBUG***: sesfolder={sesfolder}" if unpacked else '')
in bidsmapper.py (around line 129, see earlier some post) and report its output? Does the non-anonymized DICOMDIR data work normally if you include the ses-folders?
for sesfolder in sesfolders:
if store:
print(f"***DEBUG***: sesfolder={sesfolder}" if unpacked else '')
store = {'source': sesfolder.parent.parent.parent.parent if unpacked else rawfolder.parent,
'target': bidscoinfolder/'provenance'}
I don't know which path to feed the dicomsort function for it to work
If you lost the DICOMDIR file, you can perhaps run dicomsort directly on the ses-folder? Again, it's unfortunately quite difficult to debug from a distance
but it only sorts a few sequences of the whole exam
Is that for the anonymized data?
If you could help me on those last two exams, if I send you the data, that'd be great.
Sure, just email me so we can arrange a safe transfer
While we set up the transfer for the two remaining exams, I started mapping the correctly sorted exams.
I ran the bidsmapper
function once and started renaming about 60 files (in 650 that needed renaming). I figured that I'd save the bidsmap.yaml
in the default bidsfolder/code/bidscoin
folder corresponding to the cohort.
In said bidsmap.yaml
I indeed find the provenance
of the file I rewrote and all the parameters such as suffix
, acq
, rec
properly renamed.
When I ran the bidsmapper
again, all the same files popped again in the editor, including the ones I already renamed by hand. All files were labeled "exclude", though, where I though that the second run would rename all the same name sequences according to the few first I did by hand. Is it not how it's supposed to be ?
For example, one sequence is named T1ADNI
for one exam, and I renamed it t1anatomical
in the acq
field. I expected that all T1ADNI
in the other exams would be renamed in the same fashion.
Thanks
650 files? Is that renaming in the bidseditor or do you mean the filenames themselves? The latter shouldn't be necessary, so I assume you mean the bidseditor. And that's not good at all! The bidsmap should be a shortlist of the different scans that you made, not a list of all your scans. I suspect that there is a DICOM field in the bidsmap template that varies from scan to scan (can be a number like TE, that varies a tiny little bit). If you remove that and then re-run the bidsmapper (from scratch, e.g. by using -f
) then you will have a very short list
650 files-ish to be renamed in the bidseditor, yes. See below :
The SeriesNumber
may vary, along with some of the sequences names (SeriesDescription/ProtocolName
), but not so much.
In the /home/robin/.bidscoin/4.2.1/templates/bidsmap_dccn.yaml
I commented all instances of EchoTime
. The number of files in the bidseditor went down to 614 (with the use of the -f
flag in bidsmmapper
.)
I then commented all instances of RepetitionTime
(keeping the EchoTime
also commented), it went down to 218 files to rename in the bidseditor. Should I comment more info, such as FlipAngle
? (216 files, not so useful).
The more useful comment was on both EchoTime
and RepetitionTime
so far, as the latter alone discards only about 30 files.
SeriesNumer is not used in the template bidsmap to identity scans. I mean something like this:
attributes: &anat_dicomattr # An empty / non-matching reference
dictionary that can be dereferenced in other run-items of this data
type
Modality:
ProtocolName:
SeriesDescription:
ImageType:
SequenceName:
PulseSequenceName:
SequenceVariant:
ScanningSequence:
EchoPulseSequence: # Enhanced DICOM
MRAcquisitionType:
SliceThickness:
FlipAngle:
EchoNumbers:
EchoTime:
EffectiveEchoTime:
RepetitionTime:
InPlanePhaseEncodingDirection:
Op do 18 jan 2024 om 16:48 schreef Robin Bonicel @.***>:
650 files-ish to be renamed in the bidseditor, yes. See below : Screenshot.from.2024-01-18.16-44-45.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/a6d4e1ce-5aee-48d3-903a-4831f1893e7f The SeriesNumber may vary, along with some of the sequences names ( SeriesDescription/ProtocolName), but not so much.
How do I cleanly remove/ignore the info that varies from the bidsmap template, please ? I'll start with TR/TE until I get a short list.
— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898739949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL43DBJ5PPAXVNVG5VLYPE74JAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYG4ZTSOJUHE . You are receiving this because you commented.Message ID: @.***>
Your bidseditor should have less than 10 items or so
Op do 18 jan 2024 om 17:17 schreef Marcel Zwiers @.***>:
SeriesNumer is not used in the template bidsmap to identity scans. I mean something like this:
attributes: &anat_dicomattr # An empty / non-matching reference dictionary that can be dereferenced in other run-items of this data type Modality: ProtocolName: SeriesDescription: ImageType: SequenceName: PulseSequenceName: SequenceVariant: ScanningSequence: EchoPulseSequence: # Enhanced DICOM MRAcquisitionType: SliceThickness: FlipAngle: EchoNumbers: EchoTime: EffectiveEchoTime: RepetitionTime: InPlanePhaseEncodingDirection:
Op do 18 jan 2024 om 16:48 schreef Robin Bonicel @.***
:
650 files-ish to be renamed in the bidseditor, yes. See below : Screenshot.from.2024-01-18.16-44-45.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/a6d4e1ce-5aee-48d3-903a-4831f1893e7f The SeriesNumber may vary, along with some of the sequences names ( SeriesDescription/ProtocolName), but not so much.
How do I cleanly remove/ignore the info that varies from the bidsmap template, please ? I'll start with TR/TE until I get a short list.
— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898739949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL43DBJ5PPAXVNVG5VLYPE74JAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYG4ZTSOJUHE . You are receiving this because you commented.Message ID: @.***>
I will make a FAQ about this, because this a nogo
Op do 18 jan 2024 om 17:18 schreef Marcel Zwiers @.***>:
Your bidseditor should have less than 10 items or so
Op do 18 jan 2024 om 17:17 schreef Marcel Zwiers @.***>:
SeriesNumer is not used in the template bidsmap to identity scans. I mean something like this:
attributes: &anat_dicomattr # An empty / non-matching reference dictionary that can be dereferenced in other run-items of this data type Modality: ProtocolName: SeriesDescription: ImageType: SequenceName: PulseSequenceName: SequenceVariant: ScanningSequence: EchoPulseSequence: # Enhanced DICOM MRAcquisitionType: SliceThickness: FlipAngle: EchoNumbers: EchoTime: EffectiveEchoTime: RepetitionTime: InPlanePhaseEncodingDirection:
Op do 18 jan 2024 om 16:48 schreef Robin Bonicel < @.***>:
650 files-ish to be renamed in the bidseditor, yes. See below : Screenshot.from.2024-01-18.16-44-45.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/a6d4e1ce-5aee-48d3-903a-4831f1893e7f The SeriesNumber may vary, along with some of the sequences names ( SeriesDescription/ProtocolName), but not so much.
How do I cleanly remove/ignore the info that varies from the bidsmap template, please ? I'll start with TR/TE until I get a short list.
— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898739949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL43DBJ5PPAXVNVG5VLYPE74JAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYG4ZTSOJUHE . You are receiving this because you commented.Message ID: @.***>
I have one or two exams that have no info except the SeriesNumber
, so I'm alright with the fact that I need to rename all those sequences at least once. But for the rest, I have the ProtocolName
. Commenting items in the template doesn't get me lower than 218 files in the bidseditor so far. (I have some PS_
and XX_
files to discard as exclude
, also. I don't get why they are listed as they're not dicom files.)
I guess I'll have to do the renaming on those 218 files, because I don't want to corrupt the template too much by removing too many items. Or am I missing some item that can be safely removed from the template ?
For a single scan (e.g. a T1w) there should only be one entry in the bidseditor. If you have more, then you should open (click on edit) and compare two items and see which DICOM fields are different between the scans. They should be the same
Op do 18 jan 2024 om 17:25 schreef Robin Bonicel @.***>:
I have one or two exams that have no info except the SeriesNumber, so I'm alright with the fact that I need to renamed all those sequences at least once. But for the rest, I have the ProtocolName. Commenting on items doesn't get me lower than 218 files in the bidseditor so far. (I have some PS and XX files to discard as exclude, also. I don't get why they are listed as they're not dicom files.
I guess I'll have to do the renaming on those 218 files, because I don't want to corrupt the template too much by removing too many items. Or am I missing some item that can be safely removed from the template ?
— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898807914, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL7PCLDOI2KFJDSLN6DYPFEFZAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYHAYDOOJRGQ . You are receiving this because you commented.Message ID: @.***>
If the fields are different, then bidscoins thinks they are different scans
Op do 18 jan 2024 om 17:33 schreef Marcel Zwiers @.***>:
For a single scan (e.g. a T1w) there should only be one entry in the bidseditor. If you have more, then you should open (click on edit) and compare two items and see which DICOM fields are different between the scans. They should be the same
Op do 18 jan 2024 om 17:25 schreef Robin Bonicel @.***
:
I have one or two exams that have no info except the SeriesNumber, so I'm alright with the fact that I need to renamed all those sequences at least once. But for the rest, I have the ProtocolName. Commenting on items doesn't get me lower than 218 files in the bidseditor so far. (I have some PS and XX files to discard as exclude, also. I don't get why they are listed as they're not dicom files.
I guess I'll have to do the renaming on those 218 files, because I don't want to corrupt the template too much by removing too many items. Or am I missing some item that can be safely removed from the template ?
— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898807914, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL7PCLDOI2KFJDSLN6DYPFEFZAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYHAYDOOJRGQ . You are receiving this because you commented.Message ID: @.***>
I check 3 exams so far. With the commenting of both EchoTime
and RepetitionTime
in the bidsmap template, I seem to have only one file per sequence for each exam.
If this is the expected behaviour, then we're good and I need to rename all the 218 files. I excepted the same sequence across every exam (e.g a T1w) to be present in the bidseditor only once according to its name, taking into account that I commented two values that can slightly change between exams (TR and TE).
I'm running the bidseditor on all the cohort for the first time, so one file per sequence and per exam could very well yield 218 files, taking into account that the dicomsort
sorted a lot of useless files such as :
Those files, however non dicom format, would still need sorting as "exclude" data.
You should have one file per sequence for your entire study, not per exam
Op do 18 jan 2024 om 17:50 schreef Robin Bonicel @.***>:
I check 3 exams so far. With the commenting of both EchoTime and RepetitionTime in the bidsmap template, I seem to have only one file per sequence for each exam. If this is the expected behaviour, then we're good and I need to rename all the 218 files. I excepted the same sequence across every exam (e.g a T1w) to be present in the bidseditor only once according to its name, taking into account that I commented two values that can slightly change between exams (TR and TE).
I'm running the bidseditor on all the cohort for the first time, so one file per sequence and per exam could very well yield 218 files, taking into account that the dicomsort sorted a lot of useless files such as : Screenshot.from.2024-01-18.17-49-00.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/ac5bd8ea-de0e-45da-b02c-26c906617674 Those files, however non dicom format, would still need sorting as "exclude" data.
— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898852373, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL6D72UUYEOF6AMJGJDYPFHFXAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYHA2TEMZXGM . You are receiving this because you commented.Message ID: @.***>
The idea is that you tell bidscoin how to convert each scan, then bidscoin will do that for each subject. You don't have to tell bidscoin how to convert individual subjects
Op do 18 jan 2024 om 18:00 schreef Marcel Zwiers @.***>:
You should have one file per sequence for your entire study, not per exam
Op do 18 jan 2024 om 17:50 schreef Robin Bonicel @.***
:
I check 3 exams so far. With the commenting of both EchoTime and RepetitionTime in the bidsmap template, I seem to have only one file per sequence for each exam. If this is the expected behaviour, then we're good and I need to rename all the 218 files. I excepted the same sequence across every exam (e.g a T1w) to be present in the bidseditor only once according to its name, taking into account that I commented two values that can slightly change between exams (TR and TE).
I'm running the bidseditor on all the cohort for the first time, so one file per sequence and per exam could very well yield 218 files, taking into account that the dicomsort sorted a lot of useless files such as : Screenshot.from.2024-01-18.17-49-00.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/ac5bd8ea-de0e-45da-b02c-26c906617674 Those files, however non dicom format, would still need sorting as "exclude" data.
— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898852373, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL6D72UUYEOF6AMJGJDYPFHFXAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYHA2TEMZXGM . You are receiving this because you commented.Message ID: @.***>
Ok, then I'll look into that and check why I have so many files, focusing on the MRI sequences and image files.
Another question in that aspect is that I have a lot of PS_
and XX_
files to process. Those files are non DICOM, hence they have very few attributes
in the bidseditor (which is normal), and their ProtocolName
and SequeceName
vary according to the time of the acquisition, which is written in the name. Given that, I understand that I have a lot of files to rename from the same sequence across exams.
Wouldn't it be better to discard those PS_
and XX_
files from the mapping ? Or are those useful for spectro sorting ? (I don't see the point of them in MRI sequences such s T1w or fMRI.
I started renaming all the 216 listed files in the bidseditor. So far it went according to the expected behaviour. I have a entire exam why only SeriesNumber
as info, so I needed to name every sequence. Other sequences were only different by the ProtocolName
, where a "WIP" was indicated in the field, giving 2 different sequences, e.g. "WIP T2w" and "T2w" referring to the same sequence. So I get why I have so many files.
I just stumble upon the first "duplicated sequence", shown below :
It concerns the same sequence in two different exams, but one has the SeriesDescription
field (probably because it was not anonymized by the Philips process, and the other one doesn't. So I guess the bidseditor will think of those as two different sequences.
This is indeed the main detail that gets me to rename identical sequences : the presence or lack of certain pieces of info in the dicom tag. I could trim the bidsmap template some more to shorten the number of files to edit
I renamed all the files that needed it in the bidseditor, and validated the created bidsmap.
Is it normal that there's a DICOM
field in the bids sequence names, as shown above ?
When running bidscoin /raw /bids
, the terminal prints that it's "leaving out" all the sequences in the exams, not only the labeled "exclude data". I'm left with empty exam folders in the /bids
directory, empty tsv
and json
files. I have some properly renamed in the bidseditor
, though, as shown by the "True" statement when validating the data, so I don't get the absence of conversion.
I started renaming all the 216 listed files in the bidseditor
That's really not the idea
I started renaming all the 216 listed files in the bidseditor
That's really not the idea
I know, but I think it made sense regarding what you told me and my previous comment, about the names (SeriesDescription
and/or ProtocolName
) slightly changing between sequences, or dicom tags present/absent due to the Philips anon process. I think the bidseditor works fine in that aspect, and it's our protocol that was not properly set up.
I did a backup of the created bidsmap, to keep it when using the -f
flag by mistake...which I did.
I'm now trying to find out why the bidscoin
comment leaves out every single sequence without processing a single file when my bidsmap seems properly configured (i.e not every item falls under the exclude
or extra_data
type
I made a fix for your parsing error, but I still need to look into why not all scan types are sorted/found. You can already install and try the latest version like this:
pip install --force-reinstall --no-deps git+https://github.com/Donders-Institute/bidscoin
The renaming warnings are annoying (because there are so many) but harmless
Ok, I added another tweak, the latest version should work much better for you
Ok, I added another tweak, the latest version should work much better for you
Thanks. I'll try it on the two remaining exams I sent you ? This should fix the sorting of those exams ? What about the "leaving out" of every and all sequences ? Will the tweak fix it too ?
I seem to recall in one of my early tests that once I renamed a few sequences in the bidseditor
, saved the bidsmap and reloaded it, every item in the list was labeled as exclude
data, which could well explain current behaviour.
I fixed things on the dataset you sent me, but I'm not sure how you got those hundreds of images in your bidseditor. I didn't get / can't replicate that, so perhaps you can have a look yourself and see what now works and what doesn't. The exclude
datatype is fine, it means it will not go into your bids output repository
Ok, I'll try fixing the dataset.
The main concern with the exclude
datatype is that when running the bidscoin /raw /bids
command, it leaves out every single sequence of every single exam, not converting a single thing, although I properly renamed some files.
Another related question I have is the following : when I rename the fields on a sequence I want to keep, but related to a file to exclude (a PS_
or XX_
file), will it consider every sequence with the same name as exclude
?
Those PS_
or XX_
files are sorted alone in a folder named 000_SequenceNameToKeep
, so I figured that they needed to be removed, as files and also sequences. But maybe I'm mistaken, and that explains that all goes to the exclude
type ?
This is the last issue I need to fix, and all will be right after.
Filenames are irrelevant unless you use <fileparts>
or so in your bidsmap. The automatic datatyping to exclude
has to do with the exclude
run-items in the template bidsmap (they are to greedy for you). I would like it if you could test things without making so many hacks, because 1) it should work out of the box anyway, and 2) it becomes very difficult for me to understand what is caused by bidscoin and what is caused by your hacks
I think your best option would be to run dicomsort first, e.g. like this:
dicomsort rawfolder -i sub- -j ses- -p '.*'
bidsmapper rawfolder bidsfolder
Sure thing, sorry about the complexity of the problem. Since this comment https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1894208337, I work on the original data. I completely dropped the dataset where I created extra tags when I figured it was my understanding of the commands and their flags that was at fault.
When I got the result that dicomsort
created the sorted sequence folders alongside the original data ( DICOM
folder and DICOMDIR
file) such as :
I removed the original data before going to the next step. Should I leave it in the folder ? (I was worried it would take duplicated data, both sorted and original folders)
Will run the dicomsort
with the -p
flag on the remaining data.
Sorry again about the inconvenience.
dicomsort rawfolder -i sub- -j ses- -p '.*'
This command with the -p
flag sorted all data in the two remaining exams I had (the ones I sent you). Great news, all my exams are now properly sorted !
Good, and you still have a long- instead of a short-list in the bidseditor?
Hello,
Describe the bug When running the
bidsmapper -s raw/ bids/
, I get from some files that the sub-label cannot be parsed from a filepath such as "/home/robin/Documents/Database/PIP/bids/code/bidscoin/provenance/sub-99181100824104/ses-01/99181100824104/01-/601-FMRI_TE37TR800_DYN500_6MIN47_PA_SENSE1_6/IM_0001". Because of that, I cannot change the file name in the BIDS Editor GUI from those listed files because the subject id is not found.To reproduce Not sure the problem can be reproduced without my dataset
Expected behavior I excepted the parser to find the subject's PSC1 ID as it is clearly stated in the file path listed in the error.
Screenshots This error prevents me to properly rename the files listed in the GUI :
Here are the prints I added in the bids.py script, in the
subid_sesid
function and in thedynamicvalue
function Here is the terminal screenshot I get from all those prints :Software version
Additional context I checked my DICOM files to see if the PatientName tag (0010, 0010) was present and properly filled for said subject, and it is the case.
In some of the "value" prints, I get
<<filepath:/raw_bids/sub-.*?/ses-(.*?)/>>
without parenthesis in the "sub regex. Is that expected, as the code line initially defines the subid as<<filepath:/raw_bids/sub-(.*?)/>>
? Does the absence of parenthesis prevents to regex to work properly ? This error occurs on 50 files amongst 2 subjects in my database, as it is the first time I run it as a test, and to properly fill in the bidsmap.yaml.Thanks Robin