Quasars / orange-spectroscopy

Other
51 stars 59 forks source link

OPUS file reader #2

Closed stuart-cls closed 8 years ago

stuart-cls commented 8 years ago

Hi @markotoplak ,

I now have a working version of the opus module I mentioned in my previous email. What do you think is the best way to support opus files? I understand there is a FileFormat class but I foresee two problems with that:

  1. Only supports a single file a time, which is annoying.
  2. Not sure if that class can handle the hyperspectral cube our module outputs for 3D files.

I know this question is a bit open-ended because I would like your input as an experienced Orange dev, but I will have to start moving on this part soon one way or another so that we can get other parts of the pipe going!

stuart-cls commented 8 years ago

Hi again, I implemented the simplest version: 2D only, single file in owfile.py using the FileFormat class.

The code builds a proper Orange.data.Table but fails somewhere in owfile with 'Table' object is not callable which I wasn't able to solve yet. Also I thought the sheets() interface was more generic than it actually is, I was going to use it to select the datablock to import. So more questions:

  1. Can we add a FileFormat like this using an add-in? I only got it to work by pasting it in data.io
  2. Are we just going to have to build our own file widget anyway? I don't see how we can do multiple files / select datablocks / etc with the current widget.

PS Sorry for the code copy-paste but it doesn't apply to this repo yet:

class OPUS2DReader(FileFormat):
    """Reader for OPUS files"""
    EXTENSIONS = ('.0', '.1')
    DESCRIPTION = 'OPUS 2D Spectra'

    def __init__(self, filename):
        super().__init__(filename)

        #import opusFC

    @property
    #@lru_cache(1)
    #def sheets(self):
    #    import opusFC
    #    dbs = []
    #    for db in opusFC.listContents(self.filename):
    #        if db[1] == '2D':
    #            dbs.append(db[0])
    #    return dbs

    def read(self):
        import opusFC
        if self.sheet:
            db = self.sheet
        else:
            db = 'SSC'
        try:
            data = opusFC.getOpusData(self.filename, db, '2D', 'NONE')
        except Exception:
            raise IOError("Couldn't load spectrum from " + self.filename)

        attrs, clses, metas = [], [], []

        attrs = [ContinuousVariable.make(repr(data.x[i]))
                    for i in range(data.x.shape[0])]

        domain = Domain(attrs, clses, metas)

        return Table.from_numpy(domain,
                                 data.y[None,:].astype(float, order='C'))`
markotoplak commented 8 years ago

Stuart, sorry for my not-responsiveness (I'm on a vacation).

You are right, implementation as a FileFormat is preferable.

  1. Yes, we need to be able to open multiple files or even, preferably, a folder. The File widget does not support it we would need to extend it with these features. But here I have not yet decided what would be better: having an own widget or extending the File widget. The latter ruins the simple semantics of the File widget.
  2. I think 3D files should not be a problem. The FileFormat class only cares about the file (or folder) name and can be made to read anything. What does your module output for 3D files? Me and Janez were thinking of formatting data in Orange so that X,Y,time are just stored as meta attributes so we would not need special multi-dimensional tables.
  3. The new FileFormat hierarchy was made so that we should be able to easily add new file formats from add-ons, but I am not sure how does the file type registration works... I'll try it.
  4. Maybe. But only if we would need to make the File widget too complicated. The only complication I see for now would be the ability to open multiple files...

And yes, we need and interface like the sheet selection to select the datablock, but we will have to extend it. The user will need to see at least the number of profiles for each datablock type.

Unless you would like to dig into the widget, I suggest that you teach me how to use your file reading module and I would work with the widget integration. :)

stuart-cls commented 8 years ago

No problem, enjoy your vacation.

I agree that it would be best for you to work on this. Rather than try to describe the API I can provide you with the module as a wheel and the (somewhat limited) documentation that was originally provided with it. It is fairly self-explanatory. My hope is to improve the documentation to modern standards going forward. There is no other fixed consumer of this module at the moment, so we can change the API to fit our needs here, but I will have to do that part as we can't release the source code.

What environment (os/distribution, 32/64bit, etc) do you use for development? And do you need some example 3D data files?

markotoplak commented 8 years ago

Yes, I would like a 3D file too, please. I am using 64-bit Ubuntu 14.04.

markotoplak commented 8 years ago

to 3.) To register a data set in Orange you would need to import the file containing OPUS2DReader implementation from the orangecontrib/infrared/__init__.py.

stuart-cls commented 8 years ago

Okay, this works great! I put a working prototype in opus-2D-reader branch. Unfortunately the Orange owfile sheets picker is Excel-specific when filling the combo box (most other parts are abstracted) so I can't get the Datablock picking working properly, which makes the whole thing much less useful.

stuart-cls commented 8 years ago

I did a proof-of-concept 3D loader as well and integrated automatic dimension-selection so the loader will now work for "any" OPUS file. See opus-reader.

This led me to try a large map file (4.55 GB) which of course can't be loaded into memory in 32-bit Python and crashes...

markotoplak commented 8 years ago

Stuart, with Orange updates from the end of last week your reader and sheet selection work. I only made slight changes as shown in https://github.com/markotoplak/orange-infrared/tree/opustemp2

There was a problem with one of the 3D output types for one of the files you sent but I did not debug it.

For now it will only work with a nightly build of Orange (from http://orange.biolab.si/download/files/nightly/) or a build from Github. But these will changes definitely go into the next official release.

stuart-cls commented 8 years ago

Very nice! Works really well for me.

Looks like it's the TRC (trace) dataset type, I will look into that. It's a special type of some kind.

If you have any thoughts about the opusFC API I would love to hear them, maybe if there's time during the meeting tomorrow.

markotoplak commented 8 years ago

I think the next step would be to put opusFC binary wheels on PyPI for Mac and Windows so that for at least Mac and Windows users opusFC could be installed through pip.

As far as I know, Linux wheels are not supported by PyPi.

stuart-cls commented 8 years ago

I'm pretty happy with the current implementation in #34 so I'm going to close this. We can open specific issues as necessary.