Closed yamins81 closed 10 years ago
I believe Darren has already re-engineered the format so I will have to ask him. I still want a method in the dataset base class that writes out batches in an efficient way (which is probably not preprocessing one batch at a time if you want to take advantage of multiple CPUs well)
I will write up some code that will reproduce the two main errors we've been talking about
On Friday, November 8, 2013, Dan Yamins wrote:
@ardila https://github.com/ardila @daseiberthttps://github.com/daseibert
In thinking about the dataprovider issue that we discussed yesterday, I think it's important to understand the primary reason why I re-implemented part of the code from the original dataprovider: that you have to provide a method that will read in the data, as well as simply putting the data in batches.
There is no spec for what the data looks like in the batch files. It is entirely up the user to determine what that format is. Then, you provide a method for reading the data in. What IS specified is what the data looks like when it's been read in. If there is an error in the data provider as I've written it, perhaps it's there. But of course, we have a bunch of tests that seem to show it works exactly as it ought to in the test cases.
So, with regard to the strategy of just writing out batches and then pointing the "standard" dataprovider to that path: It might be possible to re-engineer the written-out format of the data as used by the existing data providers, so that it can be read in using the existing data provider. However, there is no spec'ed out description of what that format is.
I believe the intended use pattern is that the user writes a data provider for their own read-in/write-out format, re-implementing a small part of the data provider code, as we have done. If there's some error there, let's just figure out what it is.
@ardila https://github.com/ardila if you provide a clear example of where the behavior is different from what you expect it to be, I will work on debugging it -- since I wrote the code to begin with.
— Reply to this email directly or view it on GitHubhttps://github.com/dicarlolab/archconvnets/issues/7 .
Great -- I'll look at these examples when you have them -- I want to fix this right away since it seems like a significant stumbling block, and one where we really don't understand what the problem is. Let's chat about the exact structure/use-case when you're in the lab. I'll also interface with Darren to make sure I understand what he's done.
We have solved this issue with the current DLDataProvider
@ardila @daseibert
In thinking about the dataprovider issue that we discussed yesterday, I think it's important to understand the primary reason why I re-implemented part of the code from the original dataprovider: that you have to provide a method that will read in the data, as well as simply putting the data in batches.
There is no spec for what the data looks like in the batch files. It is entirely up the user to determine what that format is. Then, you provide a method for reading the data in. What IS specified is what the data looks like when it's been read in. If there is an error in the data provider as I've written it, perhaps it's there. But of course, we have a bunch of tests that seem to show it works exactly as it ought to in the test cases.
So, with regard to the strategy of just writing out batches and then pointing the "standard" dataprovider to that path: It might be possible to re-engineer the written-out format of the data as used by the existing data providers, so that it can be read in using the existing data provider. However, there is no spec'ed out description of what that format is.
I believe the intended use pattern is that the user writes a data provider for their own read-in/write-out format, re-implementing a small part of the data provider code, as we have done. If there's some error there, let's just figure out what it is.
@ardila if you provide a clear example of where the behavior is different from what you expect it to be, I will work on debugging it -- since I wrote the code to begin with.