Open tsalo opened 1 year ago
@tsalo I had an idea for this: what if we took one of the other testing input datasets and resampled it into the coordinates/format of HCP/DCAN/ABCD? Then we'd have an additional sanity check that the pearson coefficients should be very similar as a test
I like that idea, but the other testing datasets have really low-quality data, so if we go that route I think we should replace them entirely.
At minimum, I'd want to replace the ds001419
test dataset, which has fMRIPrep NIfTI and CIFTI derivatives generated and shared by the OpenNeuro team. They didn't do any QC on the data, and I didn't realize that the normalization (at minimum) was really crappy until after I had added it to the test suite.
We can replace that dataset with a PNC subject.
Maybe create repo for mocking up test data
I started working on this in https://github.com/PennLINC/xcp_d_test_data.
Summary
We recently discussed the issue of testing our HCP/DCAN ingression code outside of intermittent full runs on real data on clusters. We decided to try taking some HCP and DCAN data and make it useable as test data. This process involves (1) anonymizing any metadata, (2) scrambling or replacing the actual imaging data with random values, (3) reducing the size of the datasets by only including files we need for XCP-D, and (4) reducing the volume-wise data to only retain about 60 volumes.
Next steps
tar.gz
files and upload to Box.