who is currently working on a program aimed at using natural language processing technology to identify different Spanish dialects.
diffusion of the Spanish language started with the expansion of imperial Spain. Today, Spanish is spoken on four continents including the Americas, Europe, Africa, and Asia. Over time Spanish has diverged into many different dialects, indigenous and non-indigenous. Some popular dialects in Spain include Catalan and Castilian. In Latin America, there is Mexican Spanish, Argentinian Spanish, Chilean Spanish, the list goes on and on. A common misconception when it comes to Spanish speakers is one considering Spanish as the sole language or dialect being spoken. However, Spanish is incredibly diverse. The many variations illustrate the lack of research and understanding surrounding the language’s diversity. Studying Spanish and its morphology can unlock many new insights and perspectives on the culture and sounds of the language. [1] [2] [3]
recognition machine that can decipher between a small number of different Spanish dialects. In my Spanish linguistics class, we have been primarily concerned with the difference in sounds between different Spanish dialects. This program I am aiming to create could potentially increase the understanding between different dialects, highlighting the subtle differences in pronunciation, syntax, and lexicology.
on GitHub. The repository is full of zip audio files of different languages being spoken. The files I am interested in cover Argentinean, Catalonian, Chilean, Colombian, Peruvian, Puerto Rican, and Venezuelan dialects. They are large zip files published three years ago that have short clips of Spanish speakers saying various short phrases about the weather in Spanish. The repository is a free publicly available resource with various contributing authors who have created these datasets for research and published them online for anyone to use. [4]
them as a dialect. I will need to be able to parse through the audio and apply semantic analysis, as this will be a very key part of the program’s ability to differentiate between the different dialects. I would like my application to be a variant of the Google speech application available online. This application allows the user to speak when prompted and allows the user to practice their pronunciation of different words, with Google checking to see if they pronounced the word correctly or not. This style of this application is very simple and allows for a convenient user experience. I would like to create something visually similar to this feature.
quantitative approaches. I will be recording the accuracy of my program as a percentage to assess if a machine can be effective in classifying different Spanish dialects. The qualitative approach will concern areas of the project surrounding my overall observations and experience of crossing technology and linguistics. By the end of the course, my project will ideally contribute to linguists researching the many different Spanish dialects and what makes each different either morphologically, phonetically, or grammatically.
create it. The software and final code will also be available publicly online, I am very open to the idea of publishing my project to a pre-print server, as one main goal of my project is to contribute to the ongoing research in Spanish linguistics. I would like to present my project with a demo using my computer at Furman Engaged, in addition to the course presentation and poster presentation.
[2] Joseph A Wieczorek. Spanish dialects and the foreign language textbook: A sound perspective. Hispania, 74(1):175–181, 1991.
[3] Silvia Martinez. Dialectal variations in spanish phonology: a literature review. Echo, 6(2):1–8, 2011.
[4] Multiple Contributors. Language-Resource Repository. ://github.com/google/language- resources?tab=readme-ov-file. [Online; accessed 24-January-2023].