cloudant / java-cloudant

A Java client for Cloudant
Apache License 2.0
79 stars 68 forks source link

getAllDocsRequestBuilder getDocsAs returns docs that include null and deleted documents #524

Closed josephine-zhuang closed 3 years ago

josephine-zhuang commented 3 years ago

Please read these guidelines before opening an issue.

Bug Description

getAllDocsRequestBuilder getDocsAs returns docs that include null and deleted documents, while getDocs returns docs not included null (not sure about the deleted documents). The implementation between getDocsAs and getDocs are different, so wondering if this is expected. If so, can you provide another method that will behave the same as getDocs that will not return non null or deleted documents.

1. Steps to reproduce and the simplest code sample possible to demonstrate the issue

Make sure that you pass the ids that not existed or deleted in the Cloudant database.

list = databaseSource.getAllDocsRequestBuilder().keys(uniqueIds.toArray(new String[uniqueIds.size()])).includeDocs(true).build()
                    .getResponse().getDocsAs(docType);

Calling this method, it will return all of the docs that matched the id even including the null document and deleted ones

Looking at the code, not sure if this was done on suppose, if so, can you create a method that we can get access to AllDocsRequestResponse.AllDocsValue so that we can filter out the null ones?

It seems like the getDocs method was implemented differently, so wondering why the difference?

getDocsAs implementation:

        @Override
        public <D> D getDocumentAsType(Class<D> docType) {
            D doc = super.getDocumentAsType(docType);
            if (doc == null) {
                // The doc could be deleted, or include docs might be false, we can try to generate
                // a sparse doc for these cases.
                AllDocsRequestResponse.AllDocsValue v = getValue();
                if (v != null) {
                    JsonObject sparse = new JsonObject();
                    sparse.addProperty("_id", getId());
                    sparse.addProperty("_rev", v.getRev());
                    sparse.addProperty("_deleted", v.isDeleted());
                    doc = gson.fromJson(sparse, docType);
                }
            }
            return doc;
        }

getDocs returns non-null documents and probably non-deleted ones as well...

    protected List<Document> internalGetDocs() {
        if (docs == null) {
            docs = new ArrayList<Document>();
            for (Row row : getRows()) {
                Document doc = row.getDocument();
                if(doc != null) {
                    docs.add(doc);
                }
            }
        }
        return docs;
    }

2. What you expected to happen

getDocsAs should only return the non-null document and non-deleted ones, or provide access to the AllDocsRequestResponse.AllDocsValue so that the client can do that.

3. What actually happened

getAllDocsRequestBuilder getDocsAs returns docs that include null and deleted documents.

Environment details

emlaver commented 3 years ago

Hi @josephine-zhuang, when you say null document do you mean a document with only an _id and _rev?

We have a comment in our javadoc.io documentation under the interface AllDocsResponse about the inclusion of deleted documents when using keys for both getDocsAs and getDocs:

Screen Shot 2021-05-21 at 2 23 41 PM

If interested, this is the closed issue related to the work for including deleted documents.

A couple of options for handling deleted docs:

If you're new to using java-cloudant, I'd recommend using our new beta cloudant-java-sdk library which will get the latest features and updates. We also include documentation with a code example for just about every Cloudant endpoint. For example, the _all_docs endpoint: https://cloud.ibm.com/apidocs/cloudant?code=java#postalldocs

josephine-zhuang commented 3 years ago

@emlaver

Hi @josephine-zhuang, when you say null document do you mean a document with only an _id and _rev?

No, the document itself is null... so my workaround was to loop the result of getDocsAs to exclude the null document...

        list = list.stream().filter(document -> document != null)
                .collect(Collectors.toList());

Are you looping or streaming through the getDocsAs in your application? You can use Java streams API to filter and collect documents that do not contain the _deleted field.

Yes, this was going to be my workaround, however, I am using this method for multiple POJO objects, and currently none of them have the field _deleted exists in POJO objects, so I would need to add it to all of the POJO objects.

Create a view with an if block that only returns documents that are not deleted. You can use the getViewRequestBuilder for querying views.

It is good that you brought this up, in getViewRequestBuilder, you have access to the actual ViewResponse as the response, therefore can get to the response.getRows(), is it possible for getAllDocsRequestBuilder to actually return a response similar to getViewRequestBuilder where I can get access to the actual rows? This way, we can decide what to do with the documents ourselves.

If you're new to using java-cloudant, I'd recommend using our new beta cloudant-java-sdk library which will get the latest features and updates.

Unfortunately, we are using java-cloudant in our code for the past 3 years, so it is not possible to migrate to cloudant-java-sdk library at this time.

emlaver commented 3 years ago

No, the document itself is null... so my workaround was to loop the result of getDocsAs to exclude the null document...

Ok, so it sounds like one or more of the document ID strings you pass to keys are for non-existing documents. I'm guessing these doc IDs are for documents that existed at one point?

Yes, this was going to be my workaround, however, I am using this method for multiple POJO objects, and currently none of them have the field _deleted exists in POJO objects, so I would need to add it to all of the POJO objects.

This might be inefficient depending on how many documents you have but you could get the _all_docs result as List<JsonObject> (JsonObject from gson library) and have a loop to skip null/deleted documents and serialize the documents you want into your POJO.

It is good that you brought this up, in getViewRequestBuilder, you have access to the actual ViewResponse as the response, therefore can get to the response.getRows(), is it possible for getAllDocsRequestBuilder to actually return a response similar to getViewRequestBuilder where I can get access to the actual rows? This way, we can decide what to do with the documents ourselves.

Your options for getting documents in the response of getAllDocsRequestBuilder are getDocs and getDocsAs. If you want to use getRows then you'll have to query a view.

You could use our client.executeRequest. It could look something like:

Database myDb = client.database("<your-db>", false);
HttpConnection response = client.executeRequest(Http.GET(
           new URL(myDb.getDBUri().toString() + "/_all_docs?include_docs=true&keys=...")));
// Use try-with-resources or try/finally to ensure stream is closed after use
try (InputStream inputStream = response.responseAsInputStream()) {
     // convert stream to POJO
}
ricellis commented 3 years ago

You can use Java streams API to filter and collect documents that do not contain the _deleted field.

currently none of them have the field _deleted exists in POJO objects, so I would need to add it to all of the POJO objects.

I think you can just take advantage of the Document metadata that is already available to you from getDocs() so that you don't need to add the deleted field to your POJOs, e.g.

// Do the all docs request
AllDocsResponse response = databaseSource.getAllDocsRequestBuilder().keys(uniqueIds.toArray(new String[uniqueIds.size()])).includeDocs(true).build()
        .getResponse();

// Use getDocs to find the list of undeleted doc ids
List<String> undeletedDocIds = response.getDocs().stream().filter(doc -> {return !doc.isDeleted();}).map(Document::getId).collect(Collectors.toList());
// Use the undeleted list to filter getDocsAs
list = response.getDocsAs(docType).stream().filter(doc -> {return undeletedDocIds.contains(doc.getId());}).collect(Collectors.toList());
josephine-zhuang commented 3 years ago

@emlaver @ricellis Thank you both for your responses. I will try out the suggestion from @ricellis first. If that doesn't work, then I will try to build the request on my own using client.executeRequest.