Transkribus / TranskribusPyClient

A Pythonic API and some command line tools to access the Transkribus server via its REST API
GNU Lesser General Public License v3.0
27 stars 14 forks source link

Exclude images from download #13

Open o-sapov opened 1 year ago

o-sapov commented 1 year ago

Is this repository an appropriate place to asking questions about this client: https://pypi.org/project/transkribus-client?

How it would be possible to exclude images from download? I found that there should be an optionbNoImage (cf. https://github.com/Transkribus/TranskribusPyClient/blob/master/src/TranskribusPyClient/client.py).

But it is not clear for me where it should be passed in? Here is how the download was implemented by a predecessor of me:

Click to see the class.

```python import logging import os import shutil from pathlib import Path from typing import List from classes.conf import ConverterConfig from classes.logger import Logger from tqdm import tqdm from transkribus import TranskribusAPI from transkribus.models import Collection, Document CONF = ConverterConfig() LOG = Logger().get_logger() class TranskribusDownloader: """ TranskribusDownloader is a wrapper for inofficial transkribus-client (https://gitlab.com/arkindex/transkribus/-/blob/master/transkribus/api.py) """ api: TranskribusAPI def downloadDocuments( self, colId: int, docIds: List[int], downloadDir: str, usePreviousDownload: bool ) -> List[Path]: """Downloads documents from Transkribus and returns a list of Paths of the directories containing the downloaded files. Attention! removes every file from downloadDir first Args: colId (int): id of transkribus collection docIds (List[int]): ids of transkribus documents downloadDir(str): parent directory for downloads Returns: List of Paths of the directories containing the downloaded files """ directories = [] downloadDir = Path(downloadDir) if not usePreviousDownload: for subdirectory in [f.path for f in os.scandir(downloadDir) if f.is_dir()]: shutil.rmtree(subdirectory) for docId in tqdm(docIds, desc='Downloading documents'): if not usePreviousDownload: collection: Collection = Collection(int(colId)) doc: Document = Document( collection, int(docId) ) LOG.info( f'Downloading document {docId} from collection {colId}') doc.download(self.api, downloadDir) directories.append(Path(f'{downloadDir}/{docId}')) return directories def __init__(self): try: self.api = TranskribusAPI() self.api.login(CONF.transkribusUser(), CONF.transkribusPassword()) except: LOG.error('No transkribus credentials provided. Could not log in.') ```

DRRV commented 1 year ago

No. your package has been developed by teklia.com

-----Original Message----- From: "Oleksii @.> To: @.>; Cc: @.***>; Sent: Thu, Jul 13, 2023 13:21 (GMT+02:00) Subject: [Transkribus/TranskribusPyClient] Exclude images from download (Issue #13)

Is this repository an appropriate place to asking questions about this client: https://pypi.org/project/transkribus-client? How it would be possible to exclude images from download? I found that there should be an optionbNoImage (cf. https://github.com/Transkribus/TranskribusPyClient/blob/master/src/TranskribusPyClient/client.py). But it is not clear for me where it should be passed in? Here is how the download was implemented by a predecessor of me: Click to see the class. import logging import os import shutil from pathlib import Path from typing import List from classes.conf import ConverterConfig from classes.logger import Logger from tqdm import tqdm from transkribus import TranskribusAPI from transkribus.models import Collection, Document CONF = ConverterConfig() LOG = Logger().get_logger() class TranskribusDownloader: """ TranskribusDownloader is a wrapper for inofficial transkribus-client (https://gitlab.com/arkindex/transkribus/-/blob/master/transkribus/api.py) """ api: TranskribusAPI def downloadDocuments( self, colId: int, docIds: List[int], downloadDir: str, usePreviousDownload: bool ) -> List[Path]: """Downloads documents from Transkribus and returns a list of Paths of the directories containing the downloaded files. Attention! removes every file from downloadDir first Args: colId (int): id of transkribus collection docIds (List[int]): ids of transkribus documents downloadDir(str): parent directory for downloads Returns: List of Paths of the directories containing the downloaded files """ directories = [] downloadDir = Path(downloadDir) if not usePreviousDownload: for subdirectory in [f.path for f in os.scandir(downloadDir) if f.is_dir()]: shutil.rmtree(subdirectory) for docId in tqdm(docIds, desc='Downloading documents'): if not usePreviousDownload: collection: Collection = Collection(int(colId)) doc: Document = Document( collection, int(docId) ) LOG.info( f'Downloading document {docId} from collection {colId}') doc.download(self.api, downloadDir) directories.append(Path(f'{downloadDir}/{docId}')) return directories def init(self): try: self.api = TranskribusAPI() self.api.login(CONF.transkribusUser(), CONF.transkribusPassword()) except: LOG.error('No transkribus credentials provided. Could not log in.') — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>