flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
834 stars 179 forks source link

Integrate shell script? #43

Closed pcace closed 4 years ago

pcace commented 4 years ago

Hi,

last year i was using an old version of flathunter. The hunter.py looked way different. At that time i changed this file a little bit to fire up my own shell script on every single id the flathunter found. It looked like this:

...............
     for expose in results:
                # check if already processed
                if expose['id'] in processed:
                    continue

               # WohnungsMailer abfeuern 
              ident = expose['id']
              subprocess.call(['/home/pi/Desktop/flathunter/flathunter/mailer.sh', str(ident)])

                self.__log__.info('New offer: ' + expose['title'])

                # to reduce traffic, some addresses need to be loaded on demand
...............

how could i integrate this now?

Thanks a lot!!!

codders commented 4 years ago

Thanks for the question.

I've reorganised the code to try and make it as simple as possible. If you want to call an action for every expose, you can do that by implementing your own 'Processor'. You can see in 'abstract_processor.py' that a processor just needs to have the 'process_expose' function implemented - that will be called for every expose. You can see more examples in the 'default_processors.py' file.

Your new processor - we can all it 'ShellScriptProcessor' - will need to be included in the processor pipeline. For your use case, you could create a new file e.g. 'email_hunter.py' and subclass the 'Hunter' object, overriding the 'hunt_flats' method (you can look at 'web_hunter.py' for an example of this).

Finally, in 'processor.py' you can add a function to the ProcessorChainBuilder to add your ShellScriptProcessor to the processor chain - again, there are lots of examples there already.

I don't know what your familiarity is with Python, but the I've tried to make the code as readable as possible. It sounds like a reasonable feature and I would be happy to include it in the main project if the implementation is tidy. Right now, we configure the filters according to the 'config.yaml' file, but not the processors - we could also look at changing that.

Does that make any sense? Or is that already a bit complicated?

Thanks!

Arthur

pcace commented 4 years ago

Hey, thank you so much for your long answer. I think that i understand your answer theortically ;) but i fear that this is too complicated for me (as i have never really got in touch with programming...) my skills end with writing simple shell scripts^^

What my shell script was doing was: get the id of the immoscout item, and then downloaded the whole webpage + images, packs it in one pdf file, sends it to my Telegram bot.

Maybe there is a better way then using a shell script from flathunter?

Cheers and thanks!!

codders commented 4 years ago

Okay. So there is a one-line version of the thing that you want...

        ...

        processor_chain = ProcessorChain.builder(self.config) \
                                        .save_all_exposes(self.id_watch) \
                                        .apply_filter(filter) \
                                        .resolve_addresses() \
                                        .calculate_durations() \
                                        .map(lambda expose: subprocess.call(['/home/pi/mailer.sh', str(expose['id'])])) \
                                        .build()
        ...

Just replace the sender_telegram() with a map() call. Does that do what you need?

pcace commented 4 years ago

Hey, thank you so much for your reply!!! I tried to use your code to start the shell script. It works one time, but then crashes with the following message:

Traceback (most recent call last):
  File "./flathunt.py", line 89, in <module>
    main()
  File "./flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "./flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/pi/Desktop/flathunter/flathunter/hunter.py", line 45, in hunt_flats
    self.__log__.info('New offer: %s', expose['title'])
TypeError: 'int' object is not subscriptable

the hutner.py looks like this now:


"""Default Flathunter implementation for the command line"""
import logging
import subprocess
from itertools import chain

from flathunter.config import Config
from flathunter.filter import Filter
from flathunter.processor import ProcessorChain

class Hunter:
    """Hunter class - basic methods for crawling and processing / filtering exposes"""
    __log__ = logging.getLogger('flathunt')

    def __init__(self, config, id_watch):
        self.config = config
        if not isinstance(self.config, Config):
            raise Exception("Invalid config for hunter - should be a 'Config' object")
        self.id_watch = id_watch

    def crawl_for_exposes(self, max_pages=None):
        """Trigger a new crawl of the configured URLs"""
        return chain(*[searcher.crawl(url, max_pages)
                       for searcher in self.config.searchers()
                       for url in self.config.get('urls', list())])

    def hunt_flats(self, max_pages=None):
        """Crawl, process and filter exposes"""
        filter_set = Filter.builder() \
                           .read_config(self.config) \
                           .filter_already_seen(self.id_watch) \
                           .build()

        processor_chain = ProcessorChain.builder(self.config) \
                                        .save_all_exposes(self.id_watch) \
                                        .apply_filter(filter_set) \
                                        .resolve_addresses() \
                                        .calculate_durations() \
                                        .map(lambda expose: subprocess.call(['/home/pi/Desktop/flathunter/flathunter/mailer.sh', str(expose['id'])])) \
                                        .build()

        result = []
        # We need to iterate over this list to force the evaluation of the pipeline
        for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
            self.__log__.info('New offer: %s', expose['title'])
            result.append(expose)

        return result

`

the shellscript im am starting looks like this:

#!/bin/bash

echo $1 >> Mailer.log
echo "---mailer---"
echo "Anzeige: "$1

cd /home/pi/Desktop/

mkdir /home/pi/Desktop/tmp

cd tmp
echo "---downloading images---"
wget -nv -O- www.immobilienscout24.de/expose/$1 | grep -oP '(?<=<\/span> <a href=").*?(?=(" target))' | wget -i-
echo "done"
echo "---downloading documents---"
wget -nv -O- www.immobilienscout24.de/expose/$1 | grep -oP '(?<=data-ng-non-bindable data-src=\")(.*?)(?=\" data-caption)' | wget -i-
echo "done"
echo "---rename jpg---"

for f in * 
do mv "$f" "$f.jpg";
done
echo "done"
echo "---downloading complete website as pdf---"
wkhtmltopdf -s A4 --disable-smart-shrinking --zoom 1.0 www.immobilienscout24.de/expose/$1 website_$1.pdf

echo "done"
echo "---covert all jpgs to pdf---"
convert *.jpg $1.pdf
echo "done"
echo "---combine all PDFs---"
pdftk *.pdf cat output $1_expose.pdf
echo "done"

echo "---send expose via Telegram---"
curl -F disable_notification=true -F chat_id="-xxxxxxxxxx" -F document=@$1"_expose.pdf" -F caption="Kontakt Aufgenommen: www.immobilienscout24.de/expose/$1" https://api.telegram.org/xxxxxxxxx:xxxxxxxxxx/sendDocument
echo "done"

echo "--- cleanup---"
rm -f *.jpg
rm -f *.pdf

cd ..
echo "done"
echo "---send application---"
cp Mieteranschreiben.side Mieteranschreiben.tmp
sed -i 's/XXXXXXXXX/'$1'/g' Mieteranschreiben.tmp
selenium-side-runner Mieteranschreiben.tmp
echo "done"

echo "---cleanup2---"

rm Mieteranschreiben.tmp
rm -rf tmp

echo "done"

the whole thing works perfectly but only for one time. after running the whole shell script i get this error... probably an easy fix, but like i said... shell scripts are basically the most advanced thing i am able to do^^

Any idea what the problem could be?

Thank you so much!!!!

codders commented 4 years ago

Yeah. Okay. So that 'map' call is supposed to transform the expose. In the case of the lambda I gave you, it turns the expose into an int, because int is the return value from the subprocess call.

You need to make a separate function and pass that to map.


def run_script(expose):
    subprocess.call(['/home/pi/Desktop/flathunter/flathunter/mailer.sh', str(expose['id'])])
    return expose

...

        processor_chain = ProcessorChain.builder(self.config) \
                                        .save_all_exposes(self.id_watch) \
                                        .apply_filter(filter_set) \
                                        .resolve_addresses() \
                                        .calculate_durations() \
                                        .map(run_script) \
                                        .build()

...

Does that work?

pcace commented 4 years ago

You are a genius!!!! thank you so much!!!!!

Works!