Open skilgall opened 8 years ago
Can you share the input and output of your call to predict?
Cesar
On Fri, Jul 8, 2016 at 11:52 AM, skilgall notifications@github.com wrote:
I followed your tutorial and wanted to apply dsstne to a different project. It seems that the only output of the predict method is one that generates recommendations with the trained net and I want classification output.
I tried training a feedforward network with the output layer being classification data to all my training instances, but the output it generated doesn't seem right. I am hoping there is a predict method call for this situation.
1.
Is there a way to produce classification output from a trained network? 2.
Is there a max number of features for a training instance? (I tried 20,000 initially but got a 'std::bad_alloc' error. 10,000 produced no error)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/amznlabs/amazon-dsstne/issues/47, or mute the thread https://github.com/notifications/unsubscribe/AAcM6IPi_mWuQisnkNuPPGrK5OIltt11ks5qTpx3gaJpZM4JITM5 .
Input file look like this: Document1 1:1:0:0:0... Document2 1:0:1:0:0...
Output file looks like this: Document1 1 Document2 0
I use the generateNetCDF method the same way as the tutorial only using the output file in the output call. I've tried two different versions of the predict call: predict -b 1024 -d gl -i features_input -o features_output -k 10 -n gl.nc -s recs -r input_file -f input_file predict -b 1024 -d gl -i features_input -o features_output -n gl.nc -s recs -r input_file -f input_file -l Output
Either way I get an output in the recs file that looks like this for all documents: Document1 1,0.000:0,0.000: Document2 1,0.000:0,0.000:
Do you have any suggestions to get non zero output from the net?
Is there anything wrong with my call to predict?
Thanks in advance
I think there is an issue with that way you have been training. Since it is Optimized for Sparse kernel. You just need to pass the label of the index which is 1. You dont need pass the label which is zero. From the above example you have given it is assumed that the label 1 and label 0 is active all the time. Instead can you pass the only the indexes which are 1.
I took your advice and following the example input data I used in my first post converted it to: Document1 1,1:2,1:... Document2 1,1:3,1:...
I've tried multiple output formats, this output is the only one that returns a non 1 or 0 output Document1 1,1 <- Class Value 1 Document2 2,1 <- Class Value 2
(I've also tried these formats with no success: Document1 1,1:2,0 Document2 1,0:2,1 Document1 1,0 Document2 1,1)
This gives output 1,0.788:2,0.992: for every single instance. (The other config formats give 1,1.000:2,1.000 for all instances or 1,1.0000 for every instance) The number of epochs affects the output in the wrong way, error goes up when epochs increase from 10 to 100.
Can you see anything wrong with my methods? I don't understand how the output should be formatted in a classification example and especially that I am getting the same prediction for each instance.
Can you try Using the following as Input Document1 1:2 Document2 t1:3
And Output as
Document1 1 Document2 2
Ensure that the separator between Document and the features is a tab. Also can you send us the command you tried and attach your sample document
I tried that exact input and output file details and received this for every instance: 1,0.000:2,0.000:
These are the commands that I have been using: generateNetCDF -d gl_input -i inputfile -o gl_input.nc -f features_input -s samples_input -c generateNetCDF -d gl_output -i outputfile -o gl_output.nc -f features_output -s samples_input -c
train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 10
predict -b 1024 -d gl -i features_input -o features_output -k 10 -n gl.nc -s recs -r inputfile -f inputfile
Attached are my input and output files Archive.zip
You have an interesting case here. You have 1000 input features, of which an average of 256 are on for a given datapoint. I am guessing the sparse kernels here will not behave efficiently, but I do believe I can detect this situation and still keep storage efficient. I am working on a simple program to build your data set correctly.
So since I don't have write access to the github repo, I'm attaching a short program to create DSSTNE-compatible data.
Observations:
dparse.cpp and a slightly modified config_1000.json attached here. To build dparse, type:
g++ -o dparse dparse.cpp -lnetcdf_c++4 -lnetcdf -lm -std=c++0x -L
config_1000.json has been changed to use "input" and "output" as the dataset names.
I followed your tutorial and wanted to apply dsstne to a different project. It seems that the only output of the predict method is one that generates recommendations with the trained net and I want classification output.
I tried training a feedforward network with the output layer being classification data to all my training instances, but the output it generated doesn't seem right. I am hoping there is a predict method call for this situation.