TLINDEN / kleingebaeck

Kleingebäck - kleinanzeigen.de Backup
GNU General Public License v3.0
7 stars 0 forks source link

Improvents ideas #30

Closed panzli closed 10 months ago

panzli commented 10 months ago

Firstly I would say thanks for your cool project.

I have a few ideas, which could improve this projects for me (and others ?).

Last week i tried to build a docker container with the main branch and it was always failing with an error by compiling the package.

Now my question: Is the main branch a stable branch?

Today I've tried it again and now it works.

From golang:1.21.5
WORKDIR /root/
RUN apt update -y
RUN apt-get install -y git make
RUN git clone https://github.com/TLINDEN/kleingebaeck --branch main
#RUN git clone https://github.com/TLINDEN/kleingebaeck --branch v0.1.0 # fix for last week
WORKDIR  /root/kleingebaeck
RUN make
RUN make install
version: "3.9"
services:

  kleinanzeigen-backup:
    container_name: kleinanzeigen-backup
    volumes:
      - ./config:/config
      - /mnt/ubuntu-nfs/backup/kleinanzeigen:/backup
    command: /bin/bash -c '
      kleingebaeck -u x -d && echo "success ad"  ; # telegram?
      kleingebaeck -u x -d && echo "success wd" ;
      kleingebaeck -u x -d && echo "success jb" '

    working_dir: /backup
    image: kleingebaeck:latest
    build: .
  1. In the new version v0.1.1 the ads are saved with the id.

Unfortunately this isn't working in my environment, because I use "hochschieben" and the add gets a new ID and I get duplicates. I already have read you limitations , but could you please add a tag to exclude the id in the backup folder for the ad?

  1. Because I'm using "hochschieben" on the other way, I have sometime corrupted pictures, which doesn't are downloadable and provide a 200 success code. If the scraper get such a picture, it fails with the following error:

folgt

Unfortunately I could't find any skip error options, could you please ad another flag for this?

TLINDEN commented 10 months ago

Howdy!

Firstly I would say thanks for your cool project.

that's great :)

Now my question: Is the main branch a stable branch?

No, not at this point. I build a release for a "stable" state, but it's work in progress, so everything is still a little bit flowing. However, using the latest tag v0.1.1 at the moment would be the best option.

But you're right, it would be better to have main on the current stable release and do development on a devel branch. I'll change that.

From golang:1.21.5
WORKDIR /root/
RUN apt update -y
RUN apt-get install -y git make
RUN git clone https://github.com/TLINDEN/kleingebaeck --branch main
#RUN git clone https://github.com/TLINDEN/kleingebaeck --branch v0.1.0 # fix for last week
WORKDIR    /root/kleingebaeck
RUN make
RUN make install

Thanks, I think I'll put a Dockerfile into the repo and probably also pre-build and distribute a ready to use image.

  1. In the new version v0.1.1 the ads are saved with the id.

Unfortunately this isn't working in my environment, because I use "hochschieben" and the add gets a new ID and I get duplicates. I already have read you limitations , but could you please add a tag to exclude the id in the backup folder for the ad?

Oh, I didn't know that pushing an ad gives it a new id, I never use that function. I can revert that change of course. In the meantime, however, you can configure it to use the old behavior. Just put this line into ~/.kleingebaeck:

adnametemplate = "{{ .Slug }}"
  1. Because I'm using "hochschieben" on the other way, I have sometime corrupted pictures, which doesn't are downloadable and provide a 200 success code. If the scraper get such a picture, it fails with the following error:

folgt

Could you please re-run the command including the -d option and send me the output? If you think it not suited to be posted publicly here, send it via mail to git AT daemon DOT de. Thanks!

Unfortunately I could't find any skip error options, could you please add another flag for this?

So you mean something like --ignore-errors or something like that? I'll consider it. It depends on the nature of the error. If "only" downloading an image fails, it would be safe to ignore it and continue. But if there's for example some file system error, the program must die.

And many thanks for the feedback and input!

Greetings, Tom

TLINDEN commented 10 months ago

https://github.com/TLINDEN/kleingebaeck/tree/develop branch has the old directory name w/o the id restored.

TLINDEN commented 10 months ago

Some final notes:

The tool now does a couple of retries when downloading HTML pages or images. It only dies when the last retry fails. However, there's a new command line option --ignoreerrors. When set, errors during image download will not lead to program abort. However, failing to download HTML pages will still lead to an abort, because the whole enterprise doesn't make any sense without the page content.

Hope this helps.

best, Tom

panzli commented 10 months ago

Thank you, this solved all my mentioned problems.

I don't want annoy you, but today I found a new enhancement. I tried to fix this self with my lack of golang knowledge . Unfortunately, I couldn't formulate a statement to check if a flag, such as "pictureoverwrite," has been provided.

My goal is that only the new ad content without the images is backed up. This is crucial because when republishing an existing ad with a user-agent script, the images are compromised each time. I aim to preserve only the original pictures, avoiding pixel distortion after multiple republishes.

func WriteImage(filename string, reader io.ReadCloser) error {
    if _, err := os.Stat(filename); err == nil {
        return nil
    }

    file, err := os.Create(filename)
    if err != nil {
        return err
    }
    defer file.Close()

    _, err = io.Copy(file, reader)
    if err != nil {
        return err
    }

    return nil
}
TLINDEN commented 10 months ago

Would you mind opening a new issue for this? Thanks a lot!