GDGSongDo / kakao-dev-article

0 stars 0 forks source link

[POC] openchat links extraction poc #1

Open injae-kim opened 9 months ago

injae-kim commented 9 months ago
import re, pprint

date_regex = r"\d{4}년 .+월 .+?일"
url_regex = r"(https?:\/\/)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)"

cur_date = ""
month_week_and_link_urls = dict()

with open("KakaoTalk_20231006_1604_34_156_group.txt", "r", encoding='UTF8') as f:
    for line in f:
        date = re.compile(date_regex).search(line)
        if date != None:
            cur_date = date.group().strip()
            # print(cur_date)

        link = re.compile(url_regex).search(line)
        if link != None:
            cur_link = link.group()
            # print(link.group())

            if cur_date not in month_week_and_link_urls.keys():
                month_week_and_link_urls[cur_date] = list()

            month_week_and_link_urls[cur_date].append(cur_link)

pprint.pprint(month_week_and_link_urls)            
{'2023년 10월 1일': ['http://bit.ly/45aU7oc',
                  'https://www.youtube.com/watch?v=zp6nybNYjBQ&list=PLSCuU2a9seuO4xpzlC7dRjrVMhV6idD42',
                  'https://youtu.be/zp6nybNYjBQ',
                  'https://youtu.be/p_q4ECN33Yc',
                  'https://youtu.be/uXS0kiJQMtw',
                  'https://youtu.be/xf4kI_emeFo',
                  'https://youtu.be/XsbKfvznouA',
                  'https://youtu.be/nj2rVsu5n8w',
                  'https://youtu.be/A2yOLycDuI4'],
 '2023년 10월 4일': ['https://medium.com/@s4.ali/flutter-code-review-dos-and-don-ts-and-best-practices-1-5d003035953e'],
 '2023년 10월 5일': ['https://www.youtube.com/playlist?list=PLSCuU2a9seuO4xpzlC7dRjrVMhV6idD42',
                  'https://festa.io/events/4014',
                  'https://janggiraffe.tistory.com/m/405',
                  'https://n.news.naver.com/mnews/article/028/0002658981'],
 '2023년 10월 6일': ['https://www.youtube.com/watch?v=_EYk-E29edo'],
 '2023년 9월 20일': ['https://medium.com/srivatsan-sridharan/how-to-grow-as-an-engineering-manager-687cad0bcac7',
                  'https://medium.com/microsoft-mobile-engineering/scaling-teams-mobile-development-evolving-the-design-pattern-c3c8ff53facb',
                  'https://devday.openai.com/',
                  'https://blog.dramancompany.com/2020/11/java-spring-ruby-on-rails/',
                  'https://medium.com/@iamprovidence/backend-side-architecture-evolution-n-layered-ddd-hexagon-onion-clean-architecture-643d72444ce4',
                  'https://medium.com/medium-eng/the-stack-that-helped-medium-drive-2-6-millennia-of-reading-time-e56801f7c492',
                  'https://entrepreneurshandbook.co/the-2025-programmer-has-just-one-option-6a819aefecd9',
                  'https://festa.io/events/4014'],
 '2023년 9월 21일': ['https://spring.io/blog/2023/09/20/hello-java-21',
                  'https://github.com/facebook/fresco/releases',
                  'https://medium.com/graalvm/graalvm-for-jdk-21-is-here-ee01177dd12d',
                  'https://svelte.dev/blog/runes',
                  'https://medium.com/javarevisited/what-is-quarkus-is-it-an-alternative-to-springboot-f88e6356b1e8',
                  'https://developerinsider.co/fix-xcode-15-dt_toolchain_dir-cannot-be-used-to-evaluate-library_search_paths-use-toolchain_dir-instead/',
                  'https://m.yna.co.kr/view/AKR20230921018900091?input=1195m',
                  'https://m.thisisgame.com/webzine/news/nboard/263/?n=176824',
                  'https://devblogs.microsoft.com/oldnewthing/20230911-00/?p=108749',
                  'https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking',
                  'https://blog.goorm.io/hominlee/',
                  'https://youtu.be/CvREqfmaum8',
                  'https://cloudonair.withgoogle.com/events/google-cloud-startup-summit-seoul-2023/watch?talk=session3',
                  'https://forum.dotnetdev.kr/t/blazor-workshop-2023/8072',
                  'https://medium.com/@dmosyan/how-does-facebook-handle-billions-of-async-requests-8b00abf32b69',
                  'https://medium.com/stackademic/why-did-elon-musk-say-that-rust-is-the-language-of-agi-eb36303ce341',
                  'https://medium.com/ai-in-plain-english/mojo-python-upgrades-db4561232724',
                  'https://news.hada.io/topic?id=10726',
                  'https://festa.io/events/4014'],
 '2023년 9월 22일': ['https://www.woowacourse.io/apply',
                  'https://www.theverge.com/2023/9/21/23880882/microsoft-365-copilot-ai-release-date',
                  'https://www.youtube.com/watch?v=XYUEQ0SyOyE',
                  'https://www.theverge.com/2023/9/21/23882074/microsoft-surface-copilot-event-2023-biggest-announcements',
                  'https://developer.android.com/studio/preview/studio-bot/availability?fbclid=IwAR37pnZZgs3pgGI9JbKECCHbPYwqtNR0lDDC01rIg5TyZHsVR3MukLvKKWw',
                  'https://zig.news/kristoff/extend-a-c-c-project-with-zig-55di',
                  'https://medium.com/@cazad3011/my-google-interview-experience-d0377057243b',
                  'https://android-developers.googleblog.com/2023/09/studio-bot-expands-to-international-markets.html',
                  'https://crast.net/363138/new-google-studio-bot-ai-coding-assistant-helps-you-code-faster/',
                  'https://youtu.be/U6s2pdxebSo?feature=shared',
                  'https://andy-bell.co.uk/a-more-modern-css-reset/',
                  'https://www.codeproject.com//Articles/5368078/Reverse-engineering-Linear-PRNG-with-Exploratory-S',
                  'https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html#GUID-8AEDDBE6-F783-4D77-8786-AC5A79F517C0',
                  'https://www.youtube.com/watch?v=G4LK_euTadU',
                  'https://github.com/cashapp/redwood.git',
                  'https://9to5google.com/2023/09/21/google-android-studio-bot-ai-global/',
                  'https://medium.com/@ksjmgrkks/%ED%95%A8%EC%88%98%ED%98%95-%EC%84%A0%EC%96%B8%ED%98%95-%ED%94%84%EB%A1%9C%EA%B7%B8%EB%9E%98%EB%B0%8D-%EA%B7%B8%EB%A6%AC%EA%B3%A0-%EC%95%88%EB%93%9C%EB%A1%9C%EC%9D%B4%EB%93%9C-62c3e610aa63',
                  'https://festa.io/events/3887',
                  'https://festa.io/events/4014',
                  'https://whoisnnamdi.com/never-enough-developers/'],
 '2023년 9월 23일': ['https://thdev.net/894',
                  'https://www.cafe-encounter.net/p1988/postgres-quick-start-for-sql-server-t-sql-developers',
                  'https://security.googleblog.com/2023/09/capslock-what-is-your-code-really.html?m=1',
                  'https://source.android.com/docs/setup/build/rust/building-rust-modules/overview',
                  'https://spring.io/blog/2023/09/20/hello-java-21',
                  'https://medium.com/anchorage/a-little-code-is-better-than-a-little-infrastructure-3371b5903874',
                  'https://news.hada.io/topic?id=11006',
                  'https://holykisa.tistory.com/112',
                  'https://www.notion.so/25-526c233bbb844a569fec3bf6b6983777'],
 '2023년 9월 24일': ['https://blog.stackademic.com/what-happens-when-you-reach-the-age-of-35-as-a-programmer-5bb7907bce91',
                  'https://teamlearners.career.greetinghr.com/',
                  'https://code-maze.com/csharp-datetimeoffset-vs-datetime/',
                  'https://quasilyte.dev/blog/post/gen-map/',
                  'https://medium.com/coderhack-com/functional-programming-using-rust-3776c10cfc6',
                  'https://boingboing.net/2023/09/11/u-s-announces-official-web-design-system.html',
                  'https://festa.io/events/4014',
                  'https://www.youtube.com/watch?v=tcFz6NY3zpc'],
 '2023년 9월 25일': ['https://adriano.fyi/posts/2023-09-24-choose-postgres-queue-technology/',
                  'https://yozm.wishket.com/magazine/detail/2238/?utm_source=oneoneone',
                  'https://shawxingkwok.github.io/ITWorks/docs/multiplatform/mvb/android/',
                  'https://medium.com/a-day-of-a-programmer/%ED%95%A8%EA%BB%98-%EC%A6%90%EA%B2%A8%EC%9A%94-%EC%83%9D%EC%82%B0%EC%84%B1-%EA%B0%9C%EB%B0%9C%ED%99%98%EA%B2%BD-%EC%84%A4%EC%A0%95-%ED%8C%81%EA%B3%BC-%EC%95%B1-%EC%B6%94%EC%B2%9C-b5b3cfbbcecc',
                  'https://brunch.co.kr/@chickenmoim/32',
                  'https://jojoldu.tistory.com/734',
                  'https://festa.io/events/4014'],
 '2023년 9월 26일': ['https://engineercodex.substack.com/p/how-facebook-scaled-memcached',
                  'https://insight.infograb.net/blog/2023/06/28/gitlab-ai/',
                  'https://speakerdeck.com/taehwandev/android-mvvm-paeteonyi-jeobgeunbeob-2023-deuroideu-naiceu',
                  'https://www.androidpolice.com/good-lock-overhaul-one-ui-6/',
                  'https://ziglang.org/news/bounties-damage-open-source-projects/',
                  'https://speakerdeck.com/taehuniy/droidknights-2023-jeongtaehun-composero-wijeseul-mandeundago-glancereul-iyonghan-caryang-wijes-gaebalgi',
                  'https://news.hada.io/topic?id=11063',
                  'https://major.io/p/quadlets-replace-docker-compose/',
                  'https://developer.apple.com/forums/thread/734244',
                  'https://github.com/rickclephas/KMM-ViewModel?fbclid=IwAR0oF-Empq11-aQFng6sDqUGvC_fHIrh9ev7GE9pqSB_9JsR81gRx320Fks'],
 '2023년 9월 27일': ['https://festa.io/events/4014',
                  'https://news.hada.io/topic?id=11073',
                  'https://lp.jetbrains.com/gamedev-day-2023/#agenda-241',
                  'https://brunch.co.kr/@supims/544',
                  'https://github.com/bizz84/flutter-tips-and-tricks',
                  'https://inthiswork.com/archives/73787',
                  'https://f-lab.career.greetinghr.com/',
                  'https://developers-kr.googleblog.com/2023/09/studio-bot-expands-to-international-markets.html?m=1',
                  'https://www.openproject.org/docs/installation-and-operations/installation/docker/',
                  'https://gist.github.com/markasoftware/f5b2e55a2c2e3abb1f9eefcdf0bfff45',
                  'https://velog.io/@skydoves/open-source-machenism',
                  'https://kazlauskas.dev/flutter-app-lifecycle-listener-overview/',
                  'https://modulabs.im/popdetail/6513fbe9edad154a661e437a'],
 '2023년 9월 28일': ['https://macoscontainers.org/'],
 '2023년 9월 29일': ['https://www.androidpolice.com/websites-block-google-bard-using-their-content/',
                  'https://festa.io/events/4014',
                  'https://steven-giesel.com/blogPost/1b2a4f18-86da-42d3-9ddc-8b41ed1eba0f',
                  'https://medium.com/coryodaniel/from-erverless-to-elixir-48752db4d7bc'],
 '2023년 9월 30일': ['https://www.youtube.com/watch?v=A2yOLycDuI4']}

TODO

  1. 민정님 Convert python dict -> mark down format, need to decide mark down format
  2. 동혁님 Show link's thumbnail (detail: Thumbnail&title parsing and make it as image (html meta tag)) ref. link
  3. 일표님 Upload on github as PR

Idea

Issues

image

image

image

injae-kim commented 9 months ago

KakaoTalk_20231006_1604_34_156_group.txt