Issue with channel name

aschoenauer-sebag commented 9 years ago

Hi,

We're having an issue with the channel name.

Naming scheme :

[***]
file_extensions = .tiff .tif .png
regex_subdirectories = ^[^_].*
regex_filename_substr = (.+?\.)
regex_dimensions =  .*?--W(?P<well>\d+)--P(?P<subwell>\d+)--.*?--T(?P<time>\d+)--(?P<channel>.+?)\.
timestamps_from_file = mtime
use_frame_indices = True

Example of filename : I10--W00202--P00001--Z00000--T00020--Cy3.tif
Branch : master, commit 77db1005b5cc1443b9ba4b752fee1b0a75226e92
Trying to : just extracting object features does not work on a remote machine, although it works both with and without GUI on my computer (all other things being the same, commit etc.). Precise bug :


Traceback (most recent call last):
  File "cecog_batch.py", line 255, in <module>
    hdf_links = analyzer.processPositions()
  File "~/software/lib/python2.7/site-packages/cecog/analyzer/core.py", line 210, in processPositions
    nimages = analyzer()
  File "~/software/lib/python2.7/site-packages/cecog/analyzer/position.py", line 728, in __call__
    n_images = self._analyze(ca)
  File "~/software/lib/python2.7/site-packages/cecog/analyzer/position.py", line 852, in _analyze
    cellanalyzer.process()
  File "~/software/lib/python2.7/site-packages/cecog/analyzer/analyzer.py", line 87, in process
    self.timeholder.prepare_raw_image(channel)
  File "~/software/lib/python2.7/site-packages/cecog/analyzer/timeholder.py", line 742, in prepare_raw_image
    channel.normalize_image(self.plate_id)
  File "~/software/lib/python2.7/site-packages/cecog/analyzer/channel.py", line 302, in normalize_image
    img_in = self.meta_image.image
  File "~/software/lib/python2.7/site-packages/cecog/io/metadata.py", line 261, in image
    return self._raw_image
  File "~/software/lib/python2.7/site-packages/cecog/io/metadata.py", line 268, in _raw_image
    self._img = self.image_container.get_image(self.coordinate)
  File "~/software/lib/python2.7/site-packages/cecog/io/imagecontainer.py", line 173, in get_image
    return self._importer.get_image(coordinate)
  File "~/software/lib/python2.7/site-packages/cecog/io/importer.py", line 123, in get_image
    [coordinate.channel] \
KeyError: '1'

Thomas and I have been trying to understand this bug for some time now. Does anybody know where it could come from? Thanks a lot for the help, Alice

sommerc commented 9 years ago

Two questions came to my mind. Is the image data 'scanned' on the remote machine or on the local machine? If so, Is the remote machine of a different OS or the python version different?

The dimension_lookup is created while scanning the plates and later stored (for convenient loading). It might be that a string is interpreted differently e.g., unicode, bytestring, etc

ThomasWalter commented 9 years ago

The image data is newly scanned on the remote machine and the corresponding xml files are generated. It is the same OS as what Alice uses (linux, but not the same distribution). Python version is different (2.7.3 on the remove system, 2.7.9 on mine).

I also tested again on a different cellcognition installation and the error remains.

I also noted that both locally and remotely, the xml files are not replaced, even if I ask to do a rescan. For instance, if I make a scan, the xml-file is generated. If I then change the naming scheme, and I do a rescan, the new rules are not going to be applied. This works only if I manually remove the xml-file. I do not know whether this is related, though.

ThomasWalter commented 9 years ago

Might be an issue with lxml ... I will check the versions.

sommerc commented 9 years ago

hmm, weird. Yeah, also looking at xmlserializer.py right now :)

sommerc commented 9 years ago

I have:

from lxml import etree etree.LXML_VERSION (3, 2, 3, 0) etree.LIBXML_COMPILED_VERSION (2, 9, 1)

ThomasWalter commented 9 years ago

Thanks.

I have In [2]: etree.LXML_VERSION Out[2]: (3, 4, 1, 0)

In [3]: etree.LIBXML_COMPILED_VERSION Out[3]: (2, 9, 2)

So, I guess that this is not the error ...

ThomasWalter commented 9 years ago

Did you have a similar problem previously?

sommerc commented 9 years ago

No. Still, the type generations in the xmlserializer.py could be a problem. Is it possible for you to debug this? e.g. to see which keys are in the nested dict and what is their type? My guess is that this will lead us to some int vs. str issues in xmlserializer...

If you send me a minimal dataset, I could try to reproduce it here...

ThomasWalter commented 9 years ago

OK, thanks a lot for the hint ; we will try to debug this, this afternoon.

Alice could send you a data set, but given that it seems to work on her machine locally and on my machine locally, I guess that we have to solve the issue on our cluster (probably you will not be able to reproduce the error anyway).

aschoenauer-sebag commented 9 years ago

Here is a mini dataset: http://cbio.ensmp.fr/~aschoenauer/mini_ds.zip just in case you have five minutes to give it a try.

Thanks!

sommerc commented 9 years ago

thx! I tried it on the current master branch. Scanning and reopening and processing works like a charm.

Can I send you my xml file? Perhaps we can conclude something from there? Or are the xml files you generated from the local and remote machine the same anyway?

aschoenauer-sebag commented 9 years ago

OK, so I have :

changed the regular expression to

regex_dimensions = .*?--W(?P<well>\d+)--P(?P<subwell>\d+)--.*?--T(?P<time>\d+)--.*?

manually changed the id of the primary channel in the conf file to be 1. I think between the versions it went from 00 to 1 since in the configuration file which I've started with in 2012 I have

primary_channelid = 00

whereas in the latest configuration file I have

primary_channelid = 1

I think that the actual channel name (such as Cy3 or b_cy3 depending on the plate) in the dataset at hand is never used in CellCognition as a coordinate, which takes channelids instead, but is nevertheless used in the importer.dimension_lookup dictionary. So I think there's a communication error between those two info containers, which causes the Key Error in the first place. Indeed, I have switched back to the first regular expression, and one can find 'Cy3' in the self.dimension_lookup dict whereas coordinate.channel is worth primary_channelid as set in the configuration file.

Alice

ThomasWalter commented 9 years ago

That's interesting ...

But what I do not understand is why the hack this works on your local machine (LINUX), my local machine (MAC) but not on the cluster, even though we checked and double checked that we are using exactly the same conf-files, the same branch, the same commit, the same naming schemes. That's really a mystery.

rhoef commented 9 years ago

Hi,

do both versions of cellcognition see the same mounts? We had a problem in the past that during copying of the data some images where missing, corrupted and scanning of the image directory end up in un defined behaviour.

lg Rudi

Am 5/15/2015 um 2:56 PM schrieb Thomas Walter:

That's interesting ...

But what I do not understand is why the hack this works on your local machine (LINUX), my local machine (MAC) but not on the cluster, even though we checked and double checked that we are using exactly the same cont-files, the same branch, the same commit, the same naming schemes. That's really a mystery.

— Reply to this email directly or view it on GitHub https://github.com/CellCognition/cecog/issues/203#issuecomment-102391162.

ThomasWalter commented 9 years ago

No, they do not see the same mounts. But we tested on small data sets (1-2 wells), so we are sure that it is not due to missing data.

CellCognition / cecog

Issue with channel name #203