davidmigloz / langchain_dart

Build LLM-powered Dart/Flutter applications.
https://langchaindart.dev
MIT License
403 stars 70 forks source link

Add support for GoogleDriveLoader #71

Open davidmigloz opened 1 year ago

davidmigloz commented 1 year ago

Dart packages:

Sahil-kachhap commented 1 year ago

@davidmigloz I am interested in working on adding this support so can you assign this to me. Also, Can you share any relevant reference other than the official documentation (above python codebase reference link is broken). And yes, Is there any specific time period under which I have to complete this functionality ?

I saw @faithoflifedev has worked on some of the document loader implementations and this is my first interaction with this repo. Will you be interested in sharing any piece of advice or suggestions on this.

davidmigloz commented 1 year ago

hey @Sahil-kachhap ,

Happy to hear that you are willing to collaborate with the project! 🙂

I've just fixed the links that point to the LangChain Python documentation and implementation. I would recommend that you take a look to it, then get familiar with the googleapis package and finally look at some of the other document loaders implementations to have a reference. There is no specific deadlines for this feature, so no pressure.

If you have any question or need more guidance don't hesitate to ask in the #contributing channel of our Discord Server, (or in this issue).

faithoflifedev commented 1 year ago

Hi @Sahil-kachhap, I'll help where I can.

I've worked with the googleapis_auth package in my yt package that wraps a lot of the Youtube APIs.

I've been hesitant to start on any cloud file system loaders, since I really haven't had the time to research how they deal with different file types (CSV, JSON, etc.) so I figured my time was better spent on the individual file types thinking that these might need to be integrated into a cloud loader.

Sahil-kachhap commented 1 year ago

[Need Review of below code snippet]: Dart Implementation of loadCredential Method

@faithoflifedev @davidmigloz here is a snippet of loadCredentials method that returns an authenticated HTTP client which can then be used by all other methods to interact with Google APIs:-

  /// Loads credentials from serviceAccountKey, tokenPath or credentialsPath (whichever exists)
  /// and returns an auto refreshing authenticated HTTP client which can then be utlisied to interact with google APIs.
  Future<AutoRefreshingAuthClient> loadCredentials() async {
    AutoRefreshingAuthClient? client;
    try {
      // If serviceAccountKey Path is not null, load credentials from serviceAccountKey.
      // By default it is set to this path '~/.credentials/credentials.json'
      // If serviceAccountKey file does not exist, then it loads credentials from tokenPath.
      if (serviceAccountKey != null) {
        final File file = File(serviceAccountKey!);
        if (file.existsSync()) {
          final String contents = await file.readAsString();
          final Map<String, dynamic> credentials = jsonDecode(contents) as Map<String, dynamic>;
          client = await clientViaServiceAccount(ServiceAccountCredentials.fromJson(credentials), scopes);
        }
      } else if (tokenPath != null) {
        /// clientViaApplicationDefaultCredentials method: Looks for credentials in the following order of preference:
        //1. A JSON file whose path is specified by GOOGLE_APPLICATION_CREDENTIALS, this file typically contains exported service account keys.
        //2. A JSON file created by gcloud auth application-default login in a well-known location (%APPDATA%/gcloud/application_default_credentials.json on Windows and $HOME/.config/gcloud/application_default_credentials.json on Linux/Mac).
        //3. On Google Compute Engine and App Engine Flex we fetch credentials from GCE metadata service.
        client = await clientViaApplicationDefaultCredentials(scopes: scopes);
      } else {
        /// if credentialsPath is valid, load credentials from credentialsPath.
        final File credentialFile = File(credentialsPath!);
        if (credentialFile.existsSync()) {
          final String contents = await credentialFile.readAsString();
          final Map<String, dynamic> credentials = jsonDecode(contents) as Map<String, dynamic>;
          // Obtains oauth2 credentials and returns an authenticated HTTP client.
          client = await clientViaUserConsent(ClientId.fromJson(credentials),scopes, (final String url) async {});
        }
      }

      if (client == null) {
        throw Exception('Error occured while loading credentials');
      }
      return client;
    } catch (e) {
      //print(e);
      throw Exception('Error occured while loading credentials');
    } finally {
      client?.close();
    }
  }
davidmigloz commented 1 year ago

@Sahil-kachhap thanks for your work.

I think it's better if LangChain.dart is not responsible for the authentication/authorization part, but delegates that to googleapis_auth package. As there are many ways you may want to authenticate depending on your use case.

I would make the loader require a AuthClient in the constructor and then it's up to the user of the loader how it creates the authenticated HTTP client.

That's the strategy I've followed for implementing the Vertex AI API client.

Let me know if that makes sense to you.

By the way, I forgot to mention that the GoogleDriveLoader should go in the langchain_google package (instead of the main langchain package where the basic loaders are). I've created a directory where you can add it.

faithoflifedev commented 1 year ago

HI @Sahil-kachhap , no comments on the code specifically. More generally, my approach to this type of scenario was to use different static methods for the varied authentication types: static Future<AutoRefreshingAuthClient>withEnvironmentVar() static Future<AutoRefreshingAuthClient>withOAuth(ClientId? oAuthClientId) static Future<AutoRefreshingAuthClient>withFile(String fileNameAndPath) static Future<AutoRefreshingAuthClient>withGenerator(RefreshTokenGenerator refreshTokenGenerator)

Sahil-kachhap commented 1 year ago

Oh yes that really makes sense @davidmigloz . Got it, I will now add the AuthClient in the constructor and yes will move the current progress in the code to that relevant package (langchain_google).

Sahil-kachhap commented 1 year ago

HI @Sahil-kachhap , no comments on the code specifically. More generally, my approach to this type of scenario was to use different static methods for the varied authentication types: static Future<AutoRefreshingAuthClient>withEnvironmentVar() static Future<AutoRefreshingAuthClient>withOAuth(ClientId? oAuthClientId) static Future<AutoRefreshingAuthClient>withFile(String fileNameAndPath) static Future<AutoRefreshingAuthClient>withGenerator(RefreshTokenGenerator refreshTokenGenerator)

@faithoflifedev Oh Yeah that's a good approach too but since I created the class for googledriveloader functionality, I doubted, whether to place all these auth methods in the same place as other loader methods. Like I wanted to keep all stuffs focused towards loader functionality (Single Responsibility Principle)

davidmigloz commented 1 year ago

We can always add convenient factory constructors later on if needed, but I would keep it simple for now.