Open SylviaWhittle opened 1 year ago
That is a very late night/early morning investigation!
It is certainly possible but may run into issues of concurrency.
The reason empty dataframes are created at the moment, at least at some stages, is because some stages fail but we need to ultimately return a dataframe, from each process_scan()
even if its empty because ultimately these are all pd.concat()
into a single data frame.
The situation you describe shouldn't arise I don't think because there is no point attempting to run dnatracing
if no grains have been found (I've just checked and we only run if dnatracing_config["run"]:
and should probably have an additional check to ensure that at least one of grainstats["direction"].shape[0] > 1
(i.e. at least one grain has been found).
If image
column is missing it can be added to the ALL_STATISTICS_COLUMN
defined in utils.py
although I've just checked and it should be the first column (the index is set to molecule_number
so not sure where its going) so not sure why its not making it through (line 300 in grainstats.py
adds the image
column to an existing dataframe with statistics).
Anyway, this is a long winded way of saying that we would probably hit problems if the empty dataframe were created before processing any files and were passed in as an argument to process_scan()
for use with GrainStats
because you would then have the issue where two images might want to add rows to the dataframe at the same time (I believe this is where things like the Global Interpreter Lock come into play). If a dataframe is to be created up front, I think the best spot to do so would be on instantiating GrainStats
and then on lines 268-293 the statistics can be appended.
However, on writing that we would still have the problem that if GrainStats
failed there would be no pd.DataFrame
to return and include in pd.concat()
(although a quick check with pd.concat([None, pd.DataFrame({"a": [1, 2], "b", [3, 4]})])
suggests this shouldn't be a problem.
Hmm, I think I'm :confused: now and would have to look at this in detail and try things out to offer any more thoughts!
I think this has been sorted with the better tracing merger (well... still not tidy but the errors when there's nothing to skeletonise, nodestats, trace or spline are gone). How do people feel about closing this issue?
While refactoring
processing.py
, it seems to me that we could simplify our handling of results dataframes and in turn reduce errors and unexpected indexes. Currently we have many different points where we create empty dataframes, add indexes and reset indexes. I haven't looked into this extensively yet, but it seems we may be able to simplify this to having just one dataframe being created with all the necessary indexes have that dataframe appended to as opposed to constant merging of many different dataframes.For example, as I am editing code, I am finding that DNAtracing is crashing when there are no grains, because after tracing, there is no
'image'
column in the results dataframe. This is because the'image'
column only gets added insidegrainstats
, which is an issue ifgrainstats
never completes, or is not run.I could well be missing some intricacies though!