What would you like to be added:
Today, some of the catalogers support the concept of 'File Ownership', specifically catalogers which implement type FileOwner interface
For example, if I scan my DPKG directory using a directory source, artifact metadata contains entries on which files are owned by my DPKG installation. Take curl as an example:
This makes sense since using a file source will cause the file resolver to only index the target file and its containing directory. So when the DPKG cataloger tries to resolve the 'Infos' directory after parsing the DPKG DB, the index will contain no entries & it will fail to resolve the file ownership metadata.
However, as a user, I do not know that I have missing metadata here unless I go and read the cataloger implementation and understand that it requires more than the scanned file to correctly populate its results.
I would like to start a discussion here regarding how feasible it would be to make catalogers 'aware' of the fact that they require > 1 file to successfully perform all of their work.
In the case of DPKG for example, if it knows that we're scanning using a file source, it could then perform a 'second pass' and attempt to index the Infos or status.d directories used to determine file ownership so that the resolver passed to findDpkgInfoFiles can find owned files despite using a file source.
Why is this needed:
When I scan with file source, I'd like the catalogers to provide me with complete results even when a suitable cataloger requires more than one file to perform its work.
What would you like to be added: Today, some of the catalogers support the concept of 'File Ownership', specifically catalogers which implement
type FileOwner interface
For example, if I scan my DPKG directory using a directory source, artifact metadata contains entries on which files are owned by my DPKG installation. Take
curl
as an example:However, when scanning with file source, we see no file metadata associated with the DPKG installation
This makes sense since using a file source will cause the file resolver to only index the target file and its containing directory. So when the DPKG cataloger tries to resolve the 'Infos' directory after parsing the DPKG DB, the index will contain no entries & it will fail to resolve the file ownership metadata.
However, as a user, I do not know that I have missing metadata here unless I go and read the cataloger implementation and understand that it requires more than the scanned file to correctly populate its results.
I would like to start a discussion here regarding how feasible it would be to make catalogers 'aware' of the fact that they require > 1 file to successfully perform all of their work.
In the case of DPKG for example, if it knows that we're scanning using a file source, it could then perform a 'second pass' and attempt to index the
Infos
orstatus.d
directories used to determine file ownership so that the resolver passed tofindDpkgInfoFiles
can find owned files despite using a file source.Why is this needed: When I scan with file source, I'd like the catalogers to provide me with complete results even when a suitable cataloger requires more than one file to perform its work.
Additional context: