Description of Problem: We wanted to concatenate the clean_header and the clean_body to create a column clean_text. As you can see on the screenshot below, we did it after applying the Transformer Pipeline. Then if we follow the examples in the tutorials, we want to use a NLP Pipeline, with a phraser function and a tokenizer. Our problem : we can apply the tokenizer on the column we want (clean_text), but we can't apply the phraser on this column. Indeed, there are only two phraser function in melusine (phraser_on_body and phraser_on_header) which apply the phraser on the columns clean_body and clean_header. We can't concatenate clean_body and clean_header after applying their phraser function, because they are in the same pipeline that the tokeniser function which can be applied on clean_text.
Overview of the Solution: One possible solution is the creation of a phraser function for the clean_text in "melusine/nlp_tools/phraser.py". It would allow us to apply the phraser on a column name clean_text.
Hello !
Description of Problem: We wanted to concatenate the clean_header and the clean_body to create a column clean_text. As you can see on the screenshot below, we did it after applying the Transformer Pipeline. Then if we follow the examples in the tutorials, we want to use a NLP Pipeline, with a phraser function and a tokenizer. Our problem : we can apply the tokenizer on the column we want (clean_text), but we can't apply the phraser on this column. Indeed, there are only two phraser function in melusine (phraser_on_body and phraser_on_header) which apply the phraser on the columns clean_body and clean_header. We can't concatenate clean_body and clean_header after applying their phraser function, because they are in the same pipeline that the tokeniser function which can be applied on clean_text.
Overview of the Solution: One possible solution is the creation of a phraser function for the clean_text in "melusine/nlp_tools/phraser.py". It would allow us to apply the phraser on a column name clean_text.
Result: