clearlydefined / service

The service side of clearlydefined.io
MIT License
45 stars 40 forks source link

Inconsistent number of Cocoapods definitions in the stats and fetchable through the API #670

Open pombredanne opened 4 years ago

pombredanne commented 4 years ago

Using the API I could only get 365 definitions from: https://api.clearlydefined.io/definitions?type=pod&provider=cocoapods and then following the "continuationToken" to

This last URL returns an empty continuationToken meaning that's the end of pagination. In contrast this page https://clearlydefined.io/stats lists 7847 pods and not only 365

pombredanne commented 4 years ago

@jeffmcaffer FYI this is one of the bugs that impairs mirroring too

nellshamrell commented 3 years ago

I was able to confirm this:

I ran an initial API call, then followed the continuation tokens like this:

curl  https://api.clearlydefined.io/definitions?type=pod&provider=cocoapods

curl  https://api.clearlydefined.io/definitions?type=pod&provider=cocoapods&continuationToken=cG9kL2NvY29hcG9kcy8tL29jdG9raXQvMC4xLjE=

curl https://api.clearlydefined.io/definitions?type=pod&provider=cocoapods&continuationToken=cG9kL2NvY29hcG9kcy8tL3VpdGFibGV2aWV3K3B1bGx0b3pvb21pbnRhYmxlaGVhZGVydmlldy8xLjAuMA==

curl https://api.clearlydefined.io/definitions?type=pod&provider=cocoapods&continuationToken=cG9kL2NvY29hcG9kcy8tL3lvdWJvcmFsaWIvNi4yLjM=

curl https://api.clearlydefined.io/definitions?type=pod&provider=cocoapods&continuationToken=cG9kL2NvY29hcG9kcy8tL3pncGFyYWxsZWx2aWV3LzAuMS42

I saved the json returned by each of those requests to a few files, then ran this script to parse them and count the components returned (not the most elegant thing I've ever written, but it worked).

const fs = require('fs')

const page_count = (filepath) => {
   let file = fs.readFileSync(filepath) 
   let parsed_data = JSON.parse(file)

   return parsed_data["data"].length
}

let total_length = page_count('temp1.json') + page_count('temp2.json') + page_count('temp3.json') + page_count('temp4.json') + page_count('temp5.json')

console.log(`The total number of components is: ${total_length}`)

The total returned was 484

The current total cocoa pods listed on the stats page is 8676 as of Jan 19, 2021.

qtomlinson commented 1 year ago

Update: I was able to download 9909 definitions from https://api.clearlydefined.io/definitions?type=pod&provider=cocoapods via continuationToken. The numbers of downloaded definitions were consistent between a few tries. This is close to the current total: 9987 on the stats page as of Jan 18, 2023.
@pombredanne Can you still reproduce the issue?

qtomlinson commented 1 year ago

When sorting is used in searching definitions, the results returned is inconsistent. For example: querying https://api.clearlydefined.io/definitions?type=npm&provider=npmjs&license=NOASSERTION&sort=releaseDate&sortDesc=true returns 1420 definitions on one occasion and 379 definitions on another.

qtomlinson commented 1 year ago

@pombredanne Can you still reproduce this issue?