Open sheljohn opened 5 years ago
Wrong user account, sorry! I'll keep using the wrong one to avoid reposting...
Given I've recently been working on #1555, my first instinct is that this same syntax could be used for streamlines data. So one would rename the individual files to conform to the necessary requirements, and then simply use the square-bracket notation when specifying the input track file at the command-line. Tractography::Reader
would then be responsible for parsing the headers of all input files in order to fill Tractography::Properties
with the consensus contents, and moving from one file to the next as streamlines data are loaded.
This isn't actually incompatible with providing a text file with a list of file names; it's maybe a little more consistent with existing capabilities, but conversely it's maybe a little less flexible.
As far as the piping is concerned, my thinking was as follows:
-
indicates piped data, just as it does for image data (the command-line parsing code already knows which arguments / options correspond to track / image data, so no ambiguity there).
On write, this would create a temporary file in the appropriate location & named as such, just as for piped images. This would be a .tck
file, but would not include any data after the header; after having created this file, and written its filesystem location to stdout
, the writer would then start dumping the raw track data on stdout
.
On read, the location of the temporary file would be read from stdin
, and the header of that file loaded in order to populate Tractography::Properties
; this file would then be immediately deleted. The reader would then start reading binary data from stdin
. The NaN
and Inf
delimiters in this stream are sufficient for separating streamlines & detecting end of data.
Don't see any reason why this wouldn't be constrained to Unix only. There wouldn't be any new filetype required: the -
at the command-line for either read or write of track data would be sufficient.
@jdtournier @Lestropie
I have something working here.
There are 3 main commits:
readlines
utility (here)Tractography::properties
and adapt commands (here)TrackFileInfo
object, properties_consensus
function, and extend Tractography::Reader
(here)This code:
./run_tests
with the same output as the untouched MRtrix3 version cloned from the original repo (actually, both fail certain tests, but the output is the same in both cases)..lst
files in place of .tck
files and seems to behave as expected with commands such as tcksift2
.Please let me know if that would be good for a PR or not :)
FYI, the version that was online this afternoon had a typo in it (which would have made compilation fail); I messed up something with the interactive rebase this morning, and didn't notice it until I pulled it somewhere else and tried to build it there. All should be in order now; 4 commits ahead of master, builds and tests fine.
I went ahead and opened a PR #1569 : happy to extend / amend / retract.
properties_consensus
function
I'm still in the process of catching up on this, but just wanted to comment on this bit specifically before I read the rest:
In #1555, I make more extensive use of the Header::merge()
function to construct the "consensus" header (which is also modified in that PR in order to support its use in this way; previously that function was exclusively for managing the square-bracket notation). It might be preferable, given the functionality for handling multiple instances of Tractography::Properties
is a very similar operation, to have the same functional interface.
This issue follows a discussion started on the community forum. I would like to submit a proposal for an enhancement, namely the support of multiple input
.tck
files (as opposed to a single one currently) wherever possible.Problem
Tractography on large datasets is typically run as parallel jobs on computing clusters. There are a number of advantages to divide the computation of a given number of tracts into smaller batches:
.tck
files into a single one (e.g. withtckedit
) requires -- at least temporarily -- double the disk-space, and this becomes problematic for large datasets; the merge operation has to be executed serially in that case.Unfortunately, most of the commands in MRtrix (and indeed the source-code itself) do not support multiple
.tck
files in input.Proposals
Wherever possible, tractography-related commands should support multiple
.tck
files in input. There are several ways this could be implemented in practice:0. Variable number of arguments
As with the
tckedit
command for instance. I think this is a bad idea, because extending support for multiple command-line arguments disrupts the current interface of several commands; this implies a lot of replicated effort to modify each command individually, and potentially breaks backwards compatibility. I don't think this is a viable solution.1. Built-in support for multiple tract files
This is what I would personally prefer, but it involves modifying/extending the existing source code. The idea is to introduce a new file-format with extension
.lst
, which contains one filename per line, and detect this extension internally in order to iterate over the files. The commands remain exactly the same, and the change does not necessarily apply only to.tck
files.Broadly, this involves a rewrite of the class
MR::DWI::Tractography::Reader
, to include a behaviour similar to the current implementation ofMR:DWI::Tractography::Editing::Loader
.The difficult parts are:
.tck
files. This is pretty much already implemented in the commandtckedit
..tck
file, or with a single weight file for all.tck
file. I think the first option is the easiest to implement (and it is the one I chose), but the second option might be more practical because commands called with a list of.tck
files (e.g.tcksift2
) would still produce a single output file.I have started implementing this on a fork of the master branch, and should have a compiling version today. This mainly involves:
src/dwi/tractography/properties.(h|cpp)
,__ReaderBase__
insrc/dwi/tractography/file_base.(h|cpp)
,Reader
insrc/dwi/tractography/file.h
,core/mrtrix.h
and~core/file/utils.h
,Properties
class in several commands.2. Piping for streamlines
This relates to the discussion in issue #480. I could not do a better job of summarising this idea than @jdtournier and @Lestropie, so perhaps they can elaborate on my short description; but as far as I understand, this leverages the ability of the host system to stream data, in order to virtually concatenate the streamlines at runtime. I am not sure: