Hi, and thanks for writing the RAKE algorithm using NLTK.
I've noticed in the code that you keep phrases as lists of words, which then makes it more difficult to compute the list of unique phrases. What I changed was to use tuples instead of lists, then the list of phrases can be a set instead and you get unique phrases with less code, and faster as well.
Coverage decreased (-0.1%) to 98.75% when pulling c0be29399bbc998a3f50b634672fa47cdfed2cc2 on cgratie:master into 4224305426f310928bd8449e11f8604c076559b9 on csurfer:master.
Coverage decreased (-0.1%) to 98.75% when pulling c0be29399bbc998a3f50b634672fa47cdfed2cc2 on cgratie:master into 4224305426f310928bd8449e11f8604c076559b9 on csurfer:master.
Coverage decreased (-0.1%) to 98.75% when pulling c0be29399bbc998a3f50b634672fa47cdfed2cc2 on cgratie:master into 4224305426f310928bd8449e11f8604c076559b9 on csurfer:master.
Coverage increased (+1.1%) to 100.0% when pulling ec8ffe11a13d55b83c8d5d6030e6865b400a3467 on cgratie:master into 4224305426f310928bd8449e11f8604c076559b9 on csurfer:master.
Coverage increased (+1.1%) to 100.0% when pulling ec8ffe11a13d55b83c8d5d6030e6865b400a3467 on cgratie:master into 4224305426f310928bd8449e11f8604c076559b9 on csurfer:master.
Coverage increased (+1.1%) to 100.0% when pulling ec8ffe11a13d55b83c8d5d6030e6865b400a3467 on cgratie:master into 4224305426f310928bd8449e11f8604c076559b9 on csurfer:master.
Coverage increased (+1.1%) to 100.0% when pulling db1bca72302a3d4060711090a1e5754ccda012bf on cgratie:master into 4224305426f310928bd8449e11f8604c076559b9 on csurfer:master.
Hi, and thanks for writing the RAKE algorithm using NLTK.
I've noticed in the code that you keep phrases as lists of words, which then makes it more difficult to compute the list of unique phrases. What I changed was to use tuples instead of lists, then the list of phrases can be a set instead and you get unique phrases with less code, and faster as well.