Closed drorhunvural closed 1 year ago
There are four labels in the file you provide above. The receptor structure is not a label.
Doesn't the first part represent the label or am I wrong? The first part of the file I mentioned above represents the label, and there are 1, 2, 3, 4, and 5. Actually, this file is a file of 3000 lines, and there are about 20% of each different label.
Each line starts with four numbers. Those are the labels. There are four of them. If what you want is the number of unique values for a given label, you will need to compute that yourself by iterating over the dataset.
"Each line starts with four numbers." I don't understand what you mean by this sentence. I have uploaded a small sample, which I claim has five different labels. The ".types" file has a total of 104 lines.
What I should understand from your sentence is that molgrid only allows up to 4 different labels without doing anything extra ?
Note Edit: My problem is to classify on a dataset that has 5 different labels, that is, to use CNN. I specify each different label in the first column of the types file and they are listed from 1 to 5.
In the answer you gave here #96, you said that the first column represents the label. I say that I have increased these labels to 5 and I say that I have 1,2,3,4,5 different labels, but you say that you have 4 labels, I do not understand it. :)
In that instance the first column was the binary classification label. It wasn't the only label, nor did I say it was. Each line in your input file is an example and each example has four labels.
4 96.57854128835454 35.182186111609546 47.35858675758119 4zsl_protein_nowat.gninatypes
4 - first label
96.57854128835454 - second label
35.182186111609546 - third label
47.35858675758119 - fourth label
Hence, num_labels is 4.
Thank you very much for the information, I thought e.num_labels() was showing the different labels in my first column because until today we were always dealing with 4 different classifacation problems. It's good to know that before publishing our paper and referencing the molgrid paper. It is very important that you answer our questions here. I appreciate you guiding us in the right direction.
I have a train.types file with five labels (1, 2, 3, and 4,5), as shown below I'm trying to create an ExampleProvider to populate traintypes. The populate part seems to work when I print the size as in the code below. My question is, why does
e.num_labels()
return a value of 4? It has to be 5. Really weird.Output Size: 2374 Num Types: 28 Num Labels: 4 (Wrong!)