DLR-SC / GitLab-Corpus

Creates a corpus for publicly accessible repositories in a GitLab instance.
Other
2 stars 1 forks source link

does not find all all projects #3

Closed StephanJanosch closed 2 years ago

StephanJanosch commented 2 years ago

When I log into Gitlab via browser, It tells me, I have 389 projects.

There are 8 in the export. Also see 0%20 projects found

Is the enumeration broken? I used my token. Even without login I can see more that 9 projects

output below

Neo4j config is not valid. Retrieving projects... Extracting... [------------------------------------] 0%20 projects found.

Project statistics for project xxxx could not be fetched. You might need write access to fix this. [#-----------------------------------] 5% Project statistics for project xxxx could not be fetched. You might need write access to fix this. [#######-----------------------------] 20% 00:00:09 Project statistics for project xxxx could not be fetched. You might need write access to fix this. [##########--------------------------] 30% 00:00:08 Project statistics for project xxxx could not be fetched. You might need write access to fix this. [#####################---------------] 60% 00:00:03 Project statistics for project xxxx could not be fetched. You might need write access to fix this. [#######################-------------] 65% 00:00:03 Project statistics for project xxxx could not be fetched. You might need write access to fix this. [#########################-----------] 70% 00:00:02 Project statistics for project xxxx could not be fetched. You might need write access to fix this. [###########################---------] 75% 00:00:02 Project statistics for project xxxe could not be fetched. You might need write access to fix this. [##################################--] 95% 00:00:00 Project statistics for project xxx could not be fetched. You might need write access to fix this. [####################################] 100%
Exporting...

schlauch commented 2 years ago

It would be good to see your executed command line as well.

But have you tried the --all-elements option? It is mentioned in https://github.com/DLR-SC/GitLab-Corpus/blob/master/docs/source/getting-started.rst#information But be aware that this might take some time!

Without that option, the GitLab API calls use pagination and only retrieves the first page. That are about 20 projects. In addition, it seems that the extractor only extracts projects with visibility public or internal: https://github.com/DLR-SC/GitLab-Corpus/blob/a4a7ce30424f7b0b6846f6802c3beedf97e40a60/src/extract.py#L256

Taking it all together:

Sorry for the rocks on the way but let us not forget that this is a student project in version 0.1 :)

StephanJanosch commented 2 years ago

I did not saw --all-element. Now it seams to work!