aboutcode-org / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
35 stars 23 forks source link

Error when visiting rpm packages #32

Open JonoYang opened 1 year ago

JonoYang commented 1 year ago
purldb-visitor-1  | ERROR:minecode.management.commands.run_visit:Visit error for URI: rsync://mirrors.kernel.org/centos/
purldb-visitor-1  | TypeError TypeError('must be real number, not str')
purldb-visitor-1  | Traceback (most recent call last):
purldb-visitor-1  |   File "/app/minecode/management/commands/run_visit.py", line 255, in visit_uri
purldb-visitor-1  |     new_uris_to_visit, visited_data, visit_error = _visit_router.process(uri_to_visit)
purldb-visitor-1  |   File "/app/minecode/route.py", line 164, in process
purldb-visitor-1  |     return endpoint(string, *args, **kwargs)
purldb-visitor-1  |   File "/app/minecode/visitors/repodata_rpms.py", line 53, in collect_repomd_urls
purldb-visitor-1  |     directory_listing = rsync.fetch_directory(uri)
purldb-visitor-1  |   File "/app/minecode/rsync.py", line 151, in fetch_directory
purldb-visitor-1  |     raise Exception('%(cmd) failed. rc:%(tc)d err: %(err)s' % locals())
purldb-visitor-1  | TypeError: must be real number, not str
35C4n0r commented 1 year ago

@JonoYang can you please tell me how to reproduce this issue. I tried reproducing the issue by running fetch_directory("rsync://mirrors.kernel.org/centos/") but wasn't able to. Also unable to locate repodata_rpms.py from

purldb-visitor-1 | File "/app/minecode/visitors/repodata_rpms.py", line 53, in collect_repomd_urls purldb-visitor-1 | directory_listing = rsync.fetch_directory(uri)

35C4n0r commented 1 year ago

@JonoYang, this is the error log I collected after, putting a few print statements (in rsync.py::fetch_directory):

uri =>  rsync://mirrors.kernel.org/centos/
command =>  rsync --no-motd --recursive -d "rsync://mirrors.kernel.org/centos/"
err =>  @ERROR: max connections (200) reached -- try again later

rsync error: error starting client-server protocol (code 5) at main.c(1675) [Receiver=3.1.3]

out =>  <_io.TextIOWrapper name=6 encoding='UTF-8'>
rc =>  None
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/jay/GSoC/purldb/minecode/rsync.py", line 158, in fetch_directory
    raise Exception('%(cmd) failed. rc:%(tc)d err: %(err)s' % locals())
TypeError: must be real number, not str

The printed command looks fine. The error err => @ERROR: max connections (200) reached -- try again later, maybe the subprocess is making to many requests (command.py::Command) ?

JonoYang commented 1 year ago

@35C4n0r

Thanks for looking into this problem!

This is how I ran the visitor on rsync://mirrors.kernel.org/centos/ only:

  1. I started with a clean database, where I ran make postgres
  2. Comment out every other line in https://github.com/nexB/purldb/blob/main/minecode/visitors/repodata_rpms.py#L29 except for yield 'rsync://mirrors.kernel.org/centos/'
  3. Replace the line https://github.com/nexB/purldb/blob/main/purldb/settings.py#L273 with 'minecode.visitors.repodata_rpms.RPMRepoDataSeed'
  4. Run make run_visit, which will seed ResourceURI table with rsync://mirrors.kernel.org/centos/' and then visit it.

However, I'm not experiencing the error I reported when I do this. I think you are onto something with regards to that error being returned. I have to double check on the machine where the error is happening to see what's going on. Maybe the message is not being formatted properly.