-
Note, 13th of February 2023: Next de facto discussion place until further notice is at #779.
————————————
So today I learned that [GitHub threads max out at 2,500 comments](https://github.com/Da…
-
Is there any way to use jax with TPUs in coreless mode?
In TensorFlow you can just use tf.device(None) to use the TPU's 300gb RAM + cpu for bigger operations but after looking at xla, the bridge, …
-
Erre kíváncsi lennék:
Exploring Weight Agnostic Neural Networks
Tuesday, August 27, 2019
Posted by Adam Gaier, Student Researcher and David Ha, Staff Research Scientist, Google Research, Tokyo
…
-
## 📌 요약
- (수행 전) Special Token을 추가하였을 때 어떤 이유로 모델 성능이 오르는가? 오히려 차원이 한차원 늘어나서 정보가 흩어져버리는거 아닌가?
- (수행 후) **핵심 : `vocab.txt`의 [unusedXXX] 토큰을 대체하여 스페셜 토큰을 추가할 수 있다. 다만, 우리 task가 domain speicific하지…
-
The snails on the README have always been curious to me. I wasn't around when that was produced - but it's definitely not a result any of us have been even close to producing yet. What's the story the…
-
Hi @saberkun, @zihangdai, @graykode, @bzantium
The original [zihangdai/XLNet](https://github.com/zihangdai/xlnet) repository doesn't get any update recently. Should we assume that the XLNet impleme…
-
https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word
GPT 3 has been released, improving the previous SOTA from 35.76 to 20.5 which is a huge gain.
BUT before the release of GPT…
-
* `make` fails on `Linux 4.15.0-52-generic`, current latest kernel for Ubuntu 18.04 LTS 64-bit, with Vanilla GNOME (used as developer and main workstation + some server services)
* *The package `inte…