Unstructured-IO / unstructured-ingest

Apache License 2.0
6 stars 5 forks source link

feat/enable github enterprise (v 3.10.8) connection #50

Open DanielBarbosabit opened 5 months ago

DanielBarbosabit commented 5 months ago

Is your feature request related to a problem? Please describe. The GithubRunner works so fine to extract data from Github, but it is not possible to use the same runner to extract data from Enterprise accounts.

Describe the solution you'd like I would to use the GithubRunner to extract data from a Github Enterprise account. So, to enable this feature, I believe the SimpleGitHubConfig class should have a new parameter to pass the base URL API from the Github Enterprise, as shown in the code below:

from unstructured.ingest.connector.git import GitAccessConfig
from unstructured.ingest.connector.github import SimpleGitHubConfig
from unstructured.ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig
from unstructured.ingest.runner import GithubRunner

if __name__ == "__main__":
    runner = GithubRunner(
        processor_config=ProcessorConfig(
            verbose=True,
            output_dir="github-ingest-output",
            num_processes=2,
        ),
        read_config=ReadConfig(),
        partition_config=PartitionConfig(),
        connector_config=SimpleGitHubConfig(
            url="<MyOrg>/<MyInternalRepo>", branch="main", access_config=GitAccessConfig(), base_url=base_url="https://<host_of_my_github_enterprise>/api/v3"
        ),
    )
    runner.run()

Describe alternatives you've considered Of course, It is necessary that the source code has to be compatible with the Github and Github enterprise API, but I already tested and it should be interesting to remove the line 32 condition , in order to be possible to allow other github hosts. Because in this way, we are not able to configure Github Enterprise account, which has different domains.

Additional context

scanny commented 5 months ago

Thanks for creating this issue @DanielBarbosabit :). We're tracking this as an enhancement and will take a look at it more closely as soon as we have bandwidth. In the meantime, if you have an implementation in mind feel free to open a PR and we'd be happy to review!