Donders-Institute / bidscoin

BIDScoin converts your source-level neuroimaging data to BIDS
https://bidscoin.readthedocs.io
GNU General Public License v3.0
130 stars 35 forks source link

ERROR | Could not parse required sub-<label> label #211

Closed rbonicel closed 10 months ago

rbonicel commented 11 months ago

Hello,

Describe the bug When running the bidsmapper -s raw/ bids/, I get from some files that the sub-label cannot be parsed from a filepath such as "/home/robin/Documents/Database/PIP/bids/code/bidscoin/provenance/sub-99181100824104/ses-01/99181100824104/01-/601-FMRI_TE37TR800_DYN500_6MIN47_PA_SENSE1_6/IM_0001". Because of that, I cannot change the file name in the BIDS Editor GUI from those listed files because the subject id is not found.

To reproduce Not sure the problem can be reproduced without my dataset

Expected behavior I excepted the parser to find the subject's PSC1 ID as it is clearly stated in the file path listed in the error.

Screenshots This error prevents me to properly rename the files listed in the GUI : Screenshot from 2023-12-14 17-43-04

Here are the prints I added in the bids.py script, in the subid_sesid function and in the dynamicvalue function Screenshot from 2023-12-14 17-33-04 Screenshot from 2023-12-14 17-33-20 Here is the terminal screenshot I get from all those prints : Screenshot from 2023-12-14 17-37-38

Software version

Additional context I checked my DICOM files to see if the PatientName tag (0010, 0010) was present and properly filled for said subject, and it is the case.

In some of the "value" prints, I get <<filepath:/raw_bids/sub-.*?/ses-(.*?)/>> without parenthesis in the "sub regex. Is that expected, as the code line initially defines the subid as <<filepath:/raw_bids/sub-(.*?)/>> ? Does the absence of parenthesis prevents to regex to work properly ? This error occurs on 50 files amongst 2 subjects in my database, as it is the first time I run it as a test, and to properly fill in the bidsmap.yaml.

Thanks Robin

marcelzwiers commented 11 months ago

I am not sure if I fully understand how your data is organized, do you have a DICOMDIR file in your subject/session folder? Can you give me the path of a DICOM file of two subjects? And to debug, can you run the bidsmapper without the -s flag (makes it harder for me)?

marcelzwiers commented 11 months ago

See also: https://bidscoin.readthedocs.io/en/stable/preparation.html

rbonicel commented 10 months ago

Hello, I prepared my data according to the https://bidscoin.readthedocs.io/en/stable/preparation.html page for the first test. I organised it adding the "sub-" prefix before the subject ID and renaming the sessions accordingly. My data follows a DICOMDIR layout. Please see below : Screenshot from 2024-01-02 10-01-04 One subject has dicom and mrs data, the other only has dicom data. The dicom paths are as follows : raw_bids/sub-99181100824104/ses-01/DICOM/IM_001 for example

I used the -s flag because the runs without it got me an error saying the the used dicom file that were copied temporarily to another location were apparently deleted by the command itself, and for those files, it isn't possible to change the data type or any other naming parameter. Screenshot from 2024-01-02 11-41-13 Screenshot from 2024-01-02 11-41-27

This last screenshot is extracted from the terminal output of the bidsmapper raw_bids/ bids/ command. All the files listed in the bidseditor that need review are deleted from the temp folder. The path listed in those last screenshots are created by the command itself, as my data follows he DICOMDIR layout.

Hope this helps. I can provide any further information needed. Thanks

Robin

marcelzwiers commented 10 months ago

That certainly helps. First of all, you probably don't have to rename your folder using a sub- prefix, because the prefix can also be passed as an input argument to bidsmapper. Second, I see you have a DICOMDIR file in your session folder, which is going to be used by bidscoin to find your DICOM files. Did you reorganise your DICOM files, i.e. make the DICOMDIR file invalid?

rbonicel commented 10 months ago

My data is initially organized as follows : subjectid1_W1 subjectid1_W2 subjectid2_W1 and so on in the origin folder.

I just rearranged the data as previously stated, with a sub- prefix and the right ses- prefix according to the timepoint. I then copied the whole dicom folder to another location with the proper sub/ses prefixes. If I look in the DICOMDIR file, the (0004, 1500) Referenced File ID should still be valid because the folder tree such as ['DICOM', '00000003', 'IM_1483'] (for example) should be intact. Does bidscoin uses another tag in the DICOMDIR file to find the dicom files, such as 0004, 1400) Offset of the Next Directory Record ?

Because my data is moved around, compressed in tar.gz for example. But the Referenced File ID supposedly remains untouched through the process.

Another thing is that the Philips anonymization process on the MRI console deletes the SeriesDescription tag, so for all dicom files I recreate it by filling it with ProtocolName tag which still contains the sequence name in my data, or other sequence related info when the ProtocolName is not present. But this shouldn't corrupt the DICOMDIR file if I'm not mistaken. I do need the SeriesDescription tag for the bidscoiner to work, I figured.

marcelzwiers commented 10 months ago

I got a bit puzzled because your error is caused by a missing prepended raw_bids part in the provenance store (i.e. it should have been /bids/code/bidscoin/provenance/raw_bids/sub-etc. A quick workaround would be to replace the raw_bids sub-string in your subject/session regexps with provenance. However, I do like to better understand what is happening here, and solve it properly... Could you try removing the ses-.. folders in raw_bids, i.e. put the DICOMDIR data directly in the sub-.. folders (same as in the readthedocs page)?

marcelzwiers commented 10 months ago

A useful debugging print statement would be to change the lines (around 129) to:

                for sesfolder in sesfolders:
                    if store:
                        print(f"***DEBUG***: sesfolder={sesfolder}" if unpacked else '')
                        store = {'source': sesfolder.parent.parent.parent.parent if unpacked else rawfolder.parent,
                                 'target': bidscoinfolder/'provenance'}
rbonicel commented 10 months ago

I'm all in favor of fixing the issue properly too.

As requested, I removed everything (including the numerous prints in my first comment), and used the following data structure : Screenshot from 2024-01-03 16-44-49 I removed the ses- prefix and copied the data "as is" in both sub- folders. The only work performed on this data is the creation of the SeriesDescription tag from the ProtocolName when available. DICOMDIR file remains untouched, as the files inside the /DICOM folder. With this file organization, I do not know how I will tackle the session issue, with the W1 and W2. Should the info be written in the (0010, 0010) PatientName tag ?

I ran bidsmapper raw_bids/ bids/ without any flags, and got the following result : Screenshot from 2024-01-03 16-52-12 Those 50 files need renaming according to the bidseditor (and they do, as the acquisition name needs to be uniformized). But those 50 files are also the ones "not found" because seemingly deleted by the script itself after the copy : Screenshot from 2024-01-03 16-50-09 I sill don't understand how and why they are deleted. Those selected files seem to cover most of the sequences in the exam, if not all.

With this test, I'm back to step 1, from the first test I did, before this issue was raised.

rbonicel commented 10 months ago

A useful debugging print statement would be to change the lines (around 129) to:

                for sesfolder in sesfolders:
                    if store:
                        print(f"***DEBUG***: sesfolder={sesfolder}" if unpacked else '')
                        store = {'source': sesfolder.parent.parent.parent.parent if unpacked else rawfolder.parent,
                                 'target': bidscoinfolder/'provenance'}

Could you please tell me to which script you are referring in this comment ? In the bids.py script, around line 129, it's where the properties is defined.

marcelzwiers commented 10 months ago

Sorry, that print statement is for the bidsmapper.py

marcelzwiers commented 10 months ago

Good, now the next step. I think DICOM files were not properly named because of the missing (anonymized) SeriesDescription field. Didn't you get any warning saying: File already exists: ..?

rbonicel commented 10 months ago

Good, now the next step. I think DICOM files were not properly named because of the missing (anonymized) SeriesDescription field. Didn't you get any warning saying: File already exists: ..?

Yes absolutely, for all files I think. So a new name was created. Do I need to clean something else before running the bidmapper with clean data ? I checked the /home/robin/.bidscoin/4.2.1/templates but they seemed untouched.

The waning is like this : Screenshot from 2024-01-03 17-39-18

FIY, the start of the command looks like this : Screenshot from 2024-01-03 17-37-10 Should I be worried about the Could not validate every run-item in the bidsmap

marcelzwiers commented 10 months ago

In your path I see sub-folders names 01- (right after the subject folder), what are those? That looks suspicious to me...

marcelzwiers commented 10 months ago

Should I be worried about the Could not validate every run-item in the bidsmap

No, I should probably delete that print statement. Bidsmaps are checked for correctness when loaded, however, some fields cannot be automatically checked because of some complicated BIDS specifications

marcelzwiers commented 10 months ago

Btw, you can also run dicomsort directly on (a testcopy of) the raw subject folder. That easier / quicker

rbonicel commented 10 months ago

In your path I see sub-folders names 01- (right after the subject folder), what are those? That looks suspicious to me...

I do not know. This path is created by the script itself when it notices that the File already exists and creates a new name. For example : WARNING | File already exists: /tmp/tmpi96jvjbs/home/robin/Documents/Database/PIP/raw_bids/sub-99181100856109/DICOM/00000002/IM_2037 -> /tmp/tmpi96jvjbs/home/robin/Documents/Database/PIP/raw_bids/sub-99181100856109/99181100856109-w1/01-/901-WIP_DTI_PIPDIR_B1000PA_TE75TR6400_ISO2/IM_2037

I understand the first path, as it is what I expect from for my exam folder. But the second path seems created by the script gathering info from the DICOM file itself : 99181100856109-w1 is the content of (0010, 0010) PatientName tag. 901-WIP_DTI_PIPDIR_B1000PA_TE75TR6400_ISO2 is the concatenation of (0020, 0011) SeriesNumber tag, a - and the SeriesDescription tag. I thought 01- was the session number added by the bidsmapper command.

Upon looking for the warning in dicomsort.py, I get the following construction for newfilename, in the sortsession function : Screenshot from 2024-01-04 09-59-03

So I'm guessing that the construction of the newfilename in the construct_name function creates this. Do you know from where this 01- part could come from ? I didn't see in the dicom header which info could be used for this.

[EDIT] I ran the bidsmapper on a non-anonymized subject, which I requested to the MRI team. So all tags are present, and no modification has been done whatsoever. I removed the patient name on the following screenshot :
Screenshot from 2024-01-04 10-42-44

So, I still don't know where the 01- comes from, but the IRM PIP is stored in (0040, 0254) PerformedProcedureStepDescription when present. I guess the Philips anon process removes it. So I guess that this part of filepath, although missing some info when the tag above is missing, is still perfectly normal in the renaming process done by the construct_name function.

I give you below the rest of the prints of the bidsmapper command on said non ano subject : Screenshot from 2024-01-04 10-55-23

I still face the same issue of the files disappearing. All 17 files Discovered 'extra_data' DICOM sample are not found at the end of the run, and thus can't be renamed.

I'll run dicomsort alone with some prints and post the result here.

rbonicel commented 10 months ago

Btw, you can also run dicomsort directly on (a testcopy of) the raw subject folder. That easier / quicker

The dicomsort gave me result, I wasn't using it properly regarding the -i SUBPREFIX flag. So, running dicomsort '/home/robin/Documents/Database/PIP/raw_bids_testcopy' -i 'sub-' gave me :

For subject sub-99181100824104, which is initially composed of : Screenshot from 2024-01-04 11-21-17 It yieled : Screenshot from 2024-01-04 11-22-17 So there's a lot of missing data. It created 5 folders, though, which is the number of sequences sorted in the folder:

INFO | Searching for subject/session folders in: /home/robin/Documents/Database/PIP/raw_bids_testcopy INFO | Reading: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/DICOMDIR INFO | >> Sorting: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01- (5811 files) INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/407-SAG INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/408-AX INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/502-SVS_PRESS_TE35TR2000_64NSA_231943_426 INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/601-FMRI_TE37TR800_DYN500_6MIN47_PA_SENSE1_6 INFO | Creating: /home/robin/Documents/Database/PIP/raw_bids_testcopy/sub-99181100824104/99181100824104/01-/501-SVS_PRESS_TE35TR2000_64NSA

So I need to figure out why the rest of the data was not processed.

I still got the File already exists warning for every file. How can I clear everything in order to remove this warning ?

Also, the dicomsort created a subfolder alongside the origin data in the subject folder. Should I remove the origin data from the folder and rearrange sorted data for the following steps, the bidsmapper and the rest ?

For subject 99181100856109 I got 67 folders created in the folder, and 67 printed lines of creation in the terminal. Everything seems in order for this subject, so I won't post the screenshots. It created 2 folders for each sequence, one containing the dicom files, and another with the XX and PS files, so that seems correct. But for the first subject, I don't know where the error lies.

I also ran the dicomsort on the non ano subject. All went well. I'll run the bidsmapper and see how it goes.

rbonicel commented 10 months ago

I ran the dicomsort on the non ano subject, 99181100857100. It works perfectly without any File already exists warning and gave the following folder : Screenshot from 2024-01-04 13-49-54 I renamed the created folder 99181100857100 to sub-99181100857100 (because I got the error that started this issue, ERROR | Could not parse required sub-<label> label from, and ran the following command : bidsmapper -n 'sub-' '/home/robin/Documents/Database/PIP/raw_bids_non_ano/99181100857100_W1/' '/home/robin/Documents/Database/PIP/bids'. Screenshot from 2024-01-04 13-41-45

I renamed successfully the files that needed : Screenshot from 2024-01-04 13-42-07

I then ran bidscoiner '/home/robin/Documents/Database/PIP/raw_bids_non_ano/99181100857100_W1' '/home/robin/Documents/Database/PIP/bids' and got the following result : Screenshot from 2024-01-04 13-44-23 which is what I expected. So I guess with this non ano exam, it worked well.

For the other subjects, I need to keep the origin data as is, but create the SeriesDescription tag and run everything. What I don't get is that the dicomsort.py has these lines Screenshot from 2024-01-04 14-00-40 in the construct_name function, so I shouldn't need to create the extra tag, because I have the ProtocolName tag available.

So my final questions are :

Thanks for all the help on the matter.

marcelzwiers commented 10 months ago

So I guess with this non ano exam, it worked well.

Good, so it seems that missing DICOM fields have caused all the issues. Dicomsort uses SeriesDescription (or, if that's not available it will fall back to ProtocolName), as well as SeriesNumber (or, if that's not available it will fall back to SeriesInstanceUID). Is the latter perhaps also removed in the anonymization step?

So my final questions are :

  • why is the ProtocolName tag not taken as an alternative when SeriesDescription is not available?

I'm pretty sure it will be used and that you don't need to re-write the SeriesDescription field yourself

  • How to get rid of the File already exists warning which gets to code to perform operations that are not suitable for the process to succeed?

In principle, the warning should not happen, but it may be the result of missing SeriesNumber values. The warning itself is innocent (it's just about giving the file a new name). It's really difficult for me to see what is going from a distance. Is there any chance you can give me copy of one of your anonymized datasets (the two sessions of one subject)?

marcelzwiers commented 10 months ago

Perhaps you can tell people not to anonymize the data so aggressively, it's really silly and causing massive problems

rbonicel commented 10 months ago

Perhaps you can tell people not to anonymize the data so aggressively, it's really silly and causing massive problems

Since I started this issue, I asked the MRI team to stop using Philips' anonymization process altogether. I'll take care of it with a lighter routine. The tests gave better results with non-anon data, so I will proceed like this from now on. I need to save old data, though.

Good, so it seems that missing DICOM fields have caused all the issues. Dicomsort uses SeriesDescription (or, if that's not available it will fall back to ProtocolName), as well as SeriesNumber (or, if that's not available it will fall back to SeriesInstanceUID). Is the latter perhaps also removed in the anonymization step?

The last two should be available is th edicom, even with the Philips anon process. I proceeded to two tests : one running dicomsort on the modified data with the extra tag and one running it on the origin data, using ProtocolName as the highest available info in your list. Good news is that I get the same result in both cases. So indeed, the re-writing of the SeriesDescription can be discarded.

I'm pretty sure it will be used and that you don't need to re-write the SeriesDescription field yourself

Yes indeed

But I still have some trouble on some exam folders, with one of the following two errors :

When I try a single exam that gets nothing sorted, I get the following error : Screenshot from 2024-01-17 11-25-05

This was the error that got me thinking in the first place that the SeriesDescription tag was compulsory. This exam is the very first one of the study, the most complicated because the Philips anon process got rid of every useful tag except the SeriesNumber tag. In such cases, I provided the -f FOLDERCHEME flag with the only tag available {PatientName}_{SeriesNumber:03d} because the function didn't fell back on it on its own. But the sorting was complete sequence-wise, which is good news.

Screenshot from 2024-01-17 11-40-23

I will try with these -f or -n flags for the remaining problematic subjects, before resorting to sending you the data. So the function doesn't fall back automatically to the lowest tag info available, just so I know when to use those flags ?

Thanks

[EDIT] I fixed mot of the errors with the flags or by giving a deeper path to the DICOMDIR file. It corrected most of the "nothing sorted" issues. I'm left with the last two exams :

If you could help me on those last two exams, if I send you the data, that'd be great. Most of the problems are solved using proper flags and data paths. I'll write a script to properly sort the special cases for the dicomsort to work properly on all data

marcelzwiers commented 10 months ago

You will also get the error: Missing 'SeriesDescription' .. if both your SeriesDescription and your ProtocolName fields are empty (I suppose that's what happened).

I'd like to understand what causes your issue with the path to the DICOMDIR file, can you perhaps insert print(f"***DEBUG***: sesfolder={sesfolder}" if unpacked else '') in bidsmapper.py (around line 129, see earlier some post) and report its output? Does the non-anonymized DICOMDIR data work normally if you include the ses-folders?

                for sesfolder in sesfolders:
                    if store:
                        print(f"***DEBUG***: sesfolder={sesfolder}" if unpacked else '')
                        store = {'source': sesfolder.parent.parent.parent.parent if unpacked else rawfolder.parent,
                                 'target': bidscoinfolder/'provenance'}

I don't know which path to feed the dicomsort function for it to work

If you lost the DICOMDIR file, you can perhaps run dicomsort directly on the ses-folder? Again, it's unfortunately quite difficult to debug from a distance

but it only sorts a few sequences of the whole exam

Is that for the anonymized data?

marcelzwiers commented 10 months ago

If you could help me on those last two exams, if I send you the data, that'd be great.

Sure, just email me so we can arrange a safe transfer

rbonicel commented 10 months ago

While we set up the transfer for the two remaining exams, I started mapping the correctly sorted exams. I ran the bidsmapper function once and started renaming about 60 files (in 650 that needed renaming). I figured that I'd save the bidsmap.yaml in the default bidsfolder/code/bidscoin folder corresponding to the cohort.

In said bidsmap.yaml I indeed find the provenance of the file I rewrote and all the parameters such as suffix, acq, rec properly renamed.

When I ran the bidsmapper again, all the same files popped again in the editor, including the ones I already renamed by hand. All files were labeled "exclude", though, where I though that the second run would rename all the same name sequences according to the few first I did by hand. Is it not how it's supposed to be ?

For example, one sequence is named T1ADNI for one exam, and I renamed it t1anatomical in the acq field. I expected that all T1ADNI in the other exams would be renamed in the same fashion.

Thanks

marcelzwiers commented 10 months ago

650 files? Is that renaming in the bidseditor or do you mean the filenames themselves? The latter shouldn't be necessary, so I assume you mean the bidseditor. And that's not good at all! The bidsmap should be a shortlist of the different scans that you made, not a list of all your scans. I suspect that there is a DICOM field in the bidsmap template that varies from scan to scan (can be a number like TE, that varies a tiny little bit). If you remove that and then re-run the bidsmapper (from scratch, e.g. by using -f) then you will have a very short list

rbonicel commented 10 months ago

650 files-ish to be renamed in the bidseditor, yes. See below : Screenshot from 2024-01-18 16-44-45 The SeriesNumber may vary, along with some of the sequences names (SeriesDescription/ProtocolName), but not so much.

In the /home/robin/.bidscoin/4.2.1/templates/bidsmap_dccn.yaml I commented all instances of EchoTime. The number of files in the bidseditor went down to 614 (with the use of the -f flag in bidsmmapper.) I then commented all instances of RepetitionTime(keeping the EchoTime also commented), it went down to 218 files to rename in the bidseditor. Should I comment more info, such as FlipAngle ? (216 files, not so useful).

The more useful comment was on both EchoTime and RepetitionTime so far, as the latter alone discards only about 30 files.

marcelzwiers commented 10 months ago

SeriesNumer is not used in the template bidsmap to identity scans. I mean something like this:


attributes: &anat_dicomattr   # An empty / non-matching reference
dictionary that can be dereferenced in other run-items of this data
type
  Modality:
  ProtocolName:
  SeriesDescription:
  ImageType:
  SequenceName:
  PulseSequenceName:
  SequenceVariant:
  ScanningSequence:
  EchoPulseSequence:          # Enhanced DICOM
  MRAcquisitionType:
  SliceThickness:
  FlipAngle:
  EchoNumbers:
  EchoTime:
  EffectiveEchoTime:
  RepetitionTime:
  InPlanePhaseEncodingDirection:

Op do 18 jan 2024 om 16:48 schreef Robin Bonicel @.***>:

650 files-ish to be renamed in the bidseditor, yes. See below : Screenshot.from.2024-01-18.16-44-45.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/a6d4e1ce-5aee-48d3-903a-4831f1893e7f The SeriesNumber may vary, along with some of the sequences names ( SeriesDescription/ProtocolName), but not so much.

How do I cleanly remove/ignore the info that varies from the bidsmap template, please ? I'll start with TR/TE until I get a short list.

— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898739949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL43DBJ5PPAXVNVG5VLYPE74JAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYG4ZTSOJUHE . You are receiving this because you commented.Message ID: @.***>

marcelzwiers commented 10 months ago

Your bidseditor should have less than 10 items or so

Op do 18 jan 2024 om 17:17 schreef Marcel Zwiers @.***>:

SeriesNumer is not used in the template bidsmap to identity scans. I mean something like this:


attributes: &anat_dicomattr   # An empty / non-matching reference dictionary that can be dereferenced in other run-items of this data type
  Modality:
  ProtocolName:
  SeriesDescription:
  ImageType:
  SequenceName:
  PulseSequenceName:
  SequenceVariant:
  ScanningSequence:
  EchoPulseSequence:          # Enhanced DICOM
  MRAcquisitionType:
  SliceThickness:
  FlipAngle:
  EchoNumbers:
  EchoTime:
  EffectiveEchoTime:
  RepetitionTime:
  InPlanePhaseEncodingDirection:

Op do 18 jan 2024 om 16:48 schreef Robin Bonicel @.***

:

650 files-ish to be renamed in the bidseditor, yes. See below : Screenshot.from.2024-01-18.16-44-45.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/a6d4e1ce-5aee-48d3-903a-4831f1893e7f The SeriesNumber may vary, along with some of the sequences names ( SeriesDescription/ProtocolName), but not so much.

How do I cleanly remove/ignore the info that varies from the bidsmap template, please ? I'll start with TR/TE until I get a short list.

— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898739949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL43DBJ5PPAXVNVG5VLYPE74JAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYG4ZTSOJUHE . You are receiving this because you commented.Message ID: @.***>

marcelzwiers commented 10 months ago

I will make a FAQ about this, because this a nogo

Op do 18 jan 2024 om 17:18 schreef Marcel Zwiers @.***>:

Your bidseditor should have less than 10 items or so

Op do 18 jan 2024 om 17:17 schreef Marcel Zwiers @.***>:

SeriesNumer is not used in the template bidsmap to identity scans. I mean something like this:


attributes: &anat_dicomattr   # An empty / non-matching reference dictionary that can be dereferenced in other run-items of this data type
  Modality:
  ProtocolName:
  SeriesDescription:
  ImageType:
  SequenceName:
  PulseSequenceName:
  SequenceVariant:
  ScanningSequence:
  EchoPulseSequence:          # Enhanced DICOM
  MRAcquisitionType:
  SliceThickness:
  FlipAngle:
  EchoNumbers:
  EchoTime:
  EffectiveEchoTime:
  RepetitionTime:
  InPlanePhaseEncodingDirection:

Op do 18 jan 2024 om 16:48 schreef Robin Bonicel < @.***>:

650 files-ish to be renamed in the bidseditor, yes. See below : Screenshot.from.2024-01-18.16-44-45.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/a6d4e1ce-5aee-48d3-903a-4831f1893e7f The SeriesNumber may vary, along with some of the sequences names ( SeriesDescription/ProtocolName), but not so much.

How do I cleanly remove/ignore the info that varies from the bidsmap template, please ? I'll start with TR/TE until I get a short list.

— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898739949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL43DBJ5PPAXVNVG5VLYPE74JAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYG4ZTSOJUHE . You are receiving this because you commented.Message ID: @.***>

rbonicel commented 10 months ago

I have one or two exams that have no info except the SeriesNumber, so I'm alright with the fact that I need to rename all those sequences at least once. But for the rest, I have the ProtocolName. Commenting items in the template doesn't get me lower than 218 files in the bidseditor so far. (I have some PS_ and XX_ files to discard as exclude, also. I don't get why they are listed as they're not dicom files.)

I guess I'll have to do the renaming on those 218 files, because I don't want to corrupt the template too much by removing too many items. Or am I missing some item that can be safely removed from the template ?

marcelzwiers commented 10 months ago

For a single scan (e.g. a T1w) there should only be one entry in the bidseditor. If you have more, then you should open (click on edit) and compare two items and see which DICOM fields are different between the scans. They should be the same

Op do 18 jan 2024 om 17:25 schreef Robin Bonicel @.***>:

I have one or two exams that have no info except the SeriesNumber, so I'm alright with the fact that I need to renamed all those sequences at least once. But for the rest, I have the ProtocolName. Commenting on items doesn't get me lower than 218 files in the bidseditor so far. (I have some PS and XX files to discard as exclude, also. I don't get why they are listed as they're not dicom files.

I guess I'll have to do the renaming on those 218 files, because I don't want to corrupt the template too much by removing too many items. Or am I missing some item that can be safely removed from the template ?

— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898807914, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL7PCLDOI2KFJDSLN6DYPFEFZAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYHAYDOOJRGQ . You are receiving this because you commented.Message ID: @.***>

marcelzwiers commented 10 months ago

If the fields are different, then bidscoins thinks they are different scans

Op do 18 jan 2024 om 17:33 schreef Marcel Zwiers @.***>:

For a single scan (e.g. a T1w) there should only be one entry in the bidseditor. If you have more, then you should open (click on edit) and compare two items and see which DICOM fields are different between the scans. They should be the same

Op do 18 jan 2024 om 17:25 schreef Robin Bonicel @.***

:

I have one or two exams that have no info except the SeriesNumber, so I'm alright with the fact that I need to renamed all those sequences at least once. But for the rest, I have the ProtocolName. Commenting on items doesn't get me lower than 218 files in the bidseditor so far. (I have some PS and XX files to discard as exclude, also. I don't get why they are listed as they're not dicom files.

I guess I'll have to do the renaming on those 218 files, because I don't want to corrupt the template too much by removing too many items. Or am I missing some item that can be safely removed from the template ?

— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898807914, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL7PCLDOI2KFJDSLN6DYPFEFZAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYHAYDOOJRGQ . You are receiving this because you commented.Message ID: @.***>

rbonicel commented 10 months ago

I check 3 exams so far. With the commenting of both EchoTime and RepetitionTime in the bidsmap template, I seem to have only one file per sequence for each exam. If this is the expected behaviour, then we're good and I need to rename all the 218 files. I excepted the same sequence across every exam (e.g a T1w) to be present in the bidseditor only once according to its name, taking into account that I commented two values that can slightly change between exams (TR and TE).

I'm running the bidseditor on all the cohort for the first time, so one file per sequence and per exam could very well yield 218 files, taking into account that the dicomsort sorted a lot of useless files such as : Screenshot from 2024-01-18 17-49-00 Those files, however non dicom format, would still need sorting as "exclude" data.

marcelzwiers commented 10 months ago

You should have one file per sequence for your entire study, not per exam

Op do 18 jan 2024 om 17:50 schreef Robin Bonicel @.***>:

I check 3 exams so far. With the commenting of both EchoTime and RepetitionTime in the bidsmap template, I seem to have only one file per sequence for each exam. If this is the expected behaviour, then we're good and I need to rename all the 218 files. I excepted the same sequence across every exam (e.g a T1w) to be present in the bidseditor only once according to its name, taking into account that I commented two values that can slightly change between exams (TR and TE).

I'm running the bidseditor on all the cohort for the first time, so one file per sequence and per exam could very well yield 218 files, taking into account that the dicomsort sorted a lot of useless files such as : Screenshot.from.2024-01-18.17-49-00.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/ac5bd8ea-de0e-45da-b02c-26c906617674 Those files, however non dicom format, would still need sorting as "exclude" data.

— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898852373, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL6D72UUYEOF6AMJGJDYPFHFXAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYHA2TEMZXGM . You are receiving this because you commented.Message ID: @.***>

marcelzwiers commented 10 months ago

The idea is that you tell bidscoin how to convert each scan, then bidscoin will do that for each subject. You don't have to tell bidscoin how to convert individual subjects

Op do 18 jan 2024 om 18:00 schreef Marcel Zwiers @.***>:

You should have one file per sequence for your entire study, not per exam

Op do 18 jan 2024 om 17:50 schreef Robin Bonicel @.***

:

I check 3 exams so far. With the commenting of both EchoTime and RepetitionTime in the bidsmap template, I seem to have only one file per sequence for each exam. If this is the expected behaviour, then we're good and I need to rename all the 218 files. I excepted the same sequence across every exam (e.g a T1w) to be present in the bidseditor only once according to its name, taking into account that I commented two values that can slightly change between exams (TR and TE).

I'm running the bidseditor on all the cohort for the first time, so one file per sequence and per exam could very well yield 218 files, taking into account that the dicomsort sorted a lot of useless files such as : Screenshot.from.2024-01-18.17-49-00.png (view on web) https://github.com/Donders-Institute/bidscoin/assets/48990813/ac5bd8ea-de0e-45da-b02c-26c906617674 Those files, however non dicom format, would still need sorting as "exclude" data.

— Reply to this email directly, view it on GitHub https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1898852373, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL6D72UUYEOF6AMJGJDYPFHFXAVCNFSM6AAAAABAVE6E6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYHA2TEMZXGM . You are receiving this because you commented.Message ID: @.***>

rbonicel commented 10 months ago

Ok, then I'll look into that and check why I have so many files, focusing on the MRI sequences and image files.

Another question in that aspect is that I have a lot of PS_ and XX_ files to process. Those files are non DICOM, hence they have very few attributes in the bidseditor (which is normal), and their ProtocolName and SequeceName vary according to the time of the acquisition, which is written in the name. Given that, I understand that I have a lot of files to rename from the same sequence across exams.

Wouldn't it be better to discard those PS_ and XX_ files from the mapping ? Or are those useful for spectro sorting ? (I don't see the point of them in MRI sequences such s T1w or fMRI.

I started renaming all the 216 listed files in the bidseditor. So far it went according to the expected behaviour. I have a entire exam why only SeriesNumber as info, so I needed to name every sequence. Other sequences were only different by the ProtocolName, where a "WIP" was indicated in the field, giving 2 different sequences, e.g. "WIP T2w" and "T2w" referring to the same sequence. So I get why I have so many files.

I just stumble upon the first "duplicated sequence", shown below : Screenshot from 2024-01-19 16-02-00

It concerns the same sequence in two different exams, but one has the SeriesDescription field (probably because it was not anonymized by the Philips process, and the other one doesn't. So I guess the bidseditor will think of those as two different sequences.

This is indeed the main detail that gets me to rename identical sequences : the presence or lack of certain pieces of info in the dicom tag. I could trim the bidsmap template some more to shorten the number of files to edit

rbonicel commented 10 months ago

I renamed all the files that needed it in the bidseditor, and validated the created bidsmap. Screenshot from 2024-01-22 16-25-14 Is it normal that there's a DICOM field in the bids sequence names, as shown above ?

When running bidscoin /raw /bids , the terminal prints that it's "leaving out" all the sequences in the exams, not only the labeled "exclude data". I'm left with empty exam folders in the /bids directory, empty tsv and json files. I have some properly renamed in the bidseditor, though, as shown by the "True" statement when validating the data, so I don't get the absence of conversion.

marcelzwiers commented 10 months ago

I started renaming all the 216 listed files in the bidseditor

That's really not the idea

rbonicel commented 10 months ago

I started renaming all the 216 listed files in the bidseditor

That's really not the idea

I know, but I think it made sense regarding what you told me and my previous comment, about the names (SeriesDescription and/or ProtocolName) slightly changing between sequences, or dicom tags present/absent due to the Philips anon process. I think the bidseditor works fine in that aspect, and it's our protocol that was not properly set up. I did a backup of the created bidsmap, to keep it when using the -f flag by mistake...which I did.

I'm now trying to find out why the bidscoin comment leaves out every single sequence without processing a single file when my bidsmap seems properly configured (i.e not every item falls under the exclude or extra_data type

marcelzwiers commented 10 months ago

I made a fix for your parsing error, but I still need to look into why not all scan types are sorted/found. You can already install and try the latest version like this:

pip install --force-reinstall --no-deps git+https://github.com/Donders-Institute/bidscoin
marcelzwiers commented 10 months ago

The renaming warnings are annoying (because there are so many) but harmless

marcelzwiers commented 10 months ago

Ok, I added another tweak, the latest version should work much better for you

rbonicel commented 10 months ago

Ok, I added another tweak, the latest version should work much better for you

Thanks. I'll try it on the two remaining exams I sent you ? This should fix the sorting of those exams ? What about the "leaving out" of every and all sequences ? Will the tweak fix it too ?

I seem to recall in one of my early tests that once I renamed a few sequences in the bidseditor, saved the bidsmap and reloaded it, every item in the list was labeled as exclude data, which could well explain current behaviour.

marcelzwiers commented 10 months ago

I fixed things on the dataset you sent me, but I'm not sure how you got those hundreds of images in your bidseditor. I didn't get / can't replicate that, so perhaps you can have a look yourself and see what now works and what doesn't. The exclude datatype is fine, it means it will not go into your bids output repository

rbonicel commented 10 months ago

Ok, I'll try fixing the dataset.

The main concern with the exclude datatype is that when running the bidscoin /raw /bids command, it leaves out every single sequence of every single exam, not converting a single thing, although I properly renamed some files. Another related question I have is the following : when I rename the fields on a sequence I want to keep, but related to a file to exclude (a PS_ or XX_ file), will it consider every sequence with the same name as exclude ? Those PS_ or XX_ files are sorted alone in a folder named 000_SequenceNameToKeep, so I figured that they needed to be removed, as files and also sequences. But maybe I'm mistaken, and that explains that all goes to the exclude type ?

This is the last issue I need to fix, and all will be right after.

marcelzwiers commented 10 months ago

Filenames are irrelevant unless you use <fileparts> or so in your bidsmap. The automatic datatyping to exclude has to do with the exclude run-items in the template bidsmap (they are to greedy for you). I would like it if you could test things without making so many hacks, because 1) it should work out of the box anyway, and 2) it becomes very difficult for me to understand what is caused by bidscoin and what is caused by your hacks

marcelzwiers commented 10 months ago

I think your best option would be to run dicomsort first, e.g. like this:

dicomsort rawfolder -i sub- -j ses- -p '.*'
bidsmapper rawfolder bidsfolder
rbonicel commented 10 months ago

Sure thing, sorry about the complexity of the problem. Since this comment https://github.com/Donders-Institute/bidscoin/issues/211#issuecomment-1894208337, I work on the original data. I completely dropped the dataset where I created extra tags when I figured it was my understanding of the commands and their flags that was at fault.

When I got the result that dicomsort created the sorted sequence folders alongside the original data ( DICOM folder and DICOMDIR file) such as : Screenshot from 2024-01-24 17-45-41 I removed the original data before going to the next step. Should I leave it in the folder ? (I was worried it would take duplicated data, both sorted and original folders)

Will run the dicomsort with the -p flag on the remaining data.

Sorry again about the inconvenience.

rbonicel commented 10 months ago
dicomsort rawfolder -i sub- -j ses- -p '.*'

This command with the -p flag sorted all data in the two remaining exams I had (the ones I sent you). Great news, all my exams are now properly sorted !

marcelzwiers commented 10 months ago

Good, and you still have a long- instead of a short-list in the bidseditor?