benmack / oneClass

One-class classification in the absence of test data.
GNU Affero General Public License v3.0
31 stars 12 forks source link

why maxent can not work? #4

Open lipinu opened 5 years ago

lipinu commented 5 years ago

Hi Benjamin,

Following the the example, I have a problem when using the maxent method . I just replaced the "ocsvm' to "maxent", but it did not work.

ocsvm.fit <- trainOcc(x=tr.x, y=tr.y, method="maxent", index=tr.index)
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 0, 

Could you help me with that? Cheers Pulni

benmack commented 5 years ago

Hi Pulni,

can you please give me some more information:

Ben

lipinu commented 5 years ago

Hi Ben, Many thanks for your reply! The error occured with the package test data, below is the full code:

data(bananas) seed <- 123456 tr.x <- bananas$tr[, -1] tr.y <- puFactor(bananas$tr[, 1], positive=1) set.seed(seed) tr.index <- createFolds(tr.y, k=10, returnTrain=TRUE) set.seed(seed) te.i <- sample(ncell(bananas$y), 1000) te.x <- extract(bananas$x, te.i) te.y <- extract(bananas$y, te.i) ocsvm.fit <- trainOcc(x=tr.x, y=tr.y, method="maxent", index=tr.index)

Error in { : task 1 failed - "arguments imply differing number of rows: 0, 52" In addition: There were 18 warnings (use warnings() to see them)

I just replaced the "ocsvm' to "maxent", and have tried many time and tested it on different computers, but the same error occured. And, the other methods (ocsvm and bsvm)can work well. ps. I installed all the needed packages(e.g.,"rJva") successfully. Cheers Pulni

benmack commented 5 years ago

Hi Pulni,

to be honest I have no idea what is happening there. And at the moment I do not find too much time. I found the following problem when using dismo::maxent directly:

When I run maxent the first time similar to how it is used in the oneClass package it runs perfectly:

> data(bananas)
> seed <- 123456
> tr.x <- bananas$tr[, -1]
> tr.y <- puFactor(bananas$tr[, 1], positive=1)
> set.seed(seed)
> tr.index <- createFolds(tr.y, k=10, returnTrain=TRUE)
> set.seed(seed)
> te.i <- sample(ncell(bananas$y), 1000)
> te.x <- extract(bananas$x, te.i)
> te.y <- extract(bananas$y, te.i)
> 
> maxent(x=tr.x, p=ifelse(tr.y=="pos", 1, 0))
Loading required namespace: rJava
class    : MaxEnt 
variables: x1 x2 
Warning message:
In matrix(as.numeric(d)) : NAs introduced by coercion

But then if I run the exact same code again I become the following error:

> maxent(x=tr.x, p=ifelse(tr.y=="pos", 1, 0))
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 2: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 3: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 4: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 5: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 6: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 7: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 8: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 9: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 10: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 11: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 12: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 13: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 14: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 15: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 16: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 17: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 18: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 19: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 20: skipping... 
Warning: Extra fields in /tmp/RtmpX8qbXd/raster/maxent/10264563173/presence line 21: skipping... 
Error: No species selected
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
  cannot open file '/tmp/RtmpX8qbXd/raster//maxent/10264563173/species.lambdas': No such file or directory

I can only run it again successfully if I restart R - that is of course not good since in oneClass we run the command often during cross-validation.

This is a very strange behavior and I am not sure if somehow the dismo::maxent function changed and we should now use it different or if there is any other problem. Anyway, I believe that this error occurs internally in oneClass. Having said this we should first find out what the problem is with the code here (direct maxent) and how to make it run.

Unfortunately I do not have too much time in the forseable future for that but if you have success to find out what is going on there I might fix it in the oneClass package.

Let me know if you feel like looking into the problem in more detail or contact the dismo maintainers because of it.

lipinu commented 5 years ago

Hi, Ben.

I think it is not the problem with dismo package, because I can run it successfully. I will keep looking into the problem.

In addition to this, I want to ask you a question. How to define a OCC algorithm in the field of remote sensing? It can be determined that the OCC algorithm is divided into two categories on the training sample: 1. only the presence point uesd(SVDD, OCSVM), 2. both the presence and background point (BSVM, PUL, maxent). Howerver, in the field of species distribution model, many people also call some binary classifiers(RF, SVM,BRT) as OCC model. Instead of using background points, they use pseudo-absence points generated with some technique. Can these be called OCC algorithms?

I will appreciate it if you could help me with that.

best regards! Pulni

benmack commented 5 years ago

I am happy if you can share more on the problem in case you find something out. I will probably not have the time to look into it in the near future.

About your other question. First of all as a general remark: In an issue thread you should not start to discuss something not related to that issue. If you think this conceptual issue is something that is worth to be discussed here in a repository issue then you should open another issue. However, I can already tell you that I do not think it is a good place and I would probably not answer that rather conceptual issue here in detail. Maybe ResearchGate is a good platform to discuss it. My personal short answer however would be: I think if you talk about OCC as a learning paradigma you can call every specific approach / method / implementation a OCC algorithm as long as you try to separate a class of interest from the rest and if you do not use supervised reference data from the rest classes (thus including if you do it without any rest class samples or with unlabeled or artificially generated rest samples). But still some people use OCC to relate to methods like the OCSVM or SVDD and that of course can absolutely make sense for their purpose. I think you should choose the terms and definitions as they make sense for you and state what you mean.